Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 12
This is a sentence-transformers model finetuned from NovaSearch/stella_en_400M_v5. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for retrieval.
SentenceTransformer(
(0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'NewModel'})
(1): Pooling({'embedding_dimension': 1024, 'pooling_mode': 'mean', 'include_prompt': True})
(2): Dense({'in_features': 1024, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity', 'module_input_name': 'sentence_embedding', 'module_output_name': 'sentence_embedding'})
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Gem-Software/stella-en-400m-v5-gem-v5-hyde")
# Run inference
sentences = [
'Instruct: Given a web search query, retrieve relevant passages that answer the query.\nQuery: Senior Engineering Manager at Airbnb (2020-Present), Engineering Manager at Uber (2016-2020), Software Engineer at Amazon (2013-2016) | BASc Computer Engineering, University of Waterloo | Search Infrastructure, Machine Learning, Distributed Systems, Elasticsearch, Team Leadership, Ranking Algorithms, A/B Testing, Personalization, Roadmap Planning | Engineering leader with 11 years building search and discovery products at marketplace scale, currently leading a team of 20 engineers at Airbnb focused on search infrastructure and personalized ranking systems.',
'Yahoo | Technical Lead - Social Patform | Worked with Yahoo Lab to develop identity mapping algorithm to unify various social graphs like Yahoo, Facebook, Twitter and Flickr to create a unified view of a person on internet.Technologies: Hadoop,HBase, Pig | Yahoo | Principal Engineer - Conversational Assistant | Developed interactive Natural Language Understanding Platform for chatbots. Platform provides Intent classification, Entity Detection, Dialog understanding, Slot filling, Domain detection etcWave of Yahoo Bots on Kik and Facebook heavily relied on this platform for online active learning.Technologies: Spark ML, Weka, Stanford NLP, CRF++ | Amazon | Senior Engineering Manager - Alexa Video | I help build personalized voice search and discovery experience on devices like Fire TV and Echo Show. When you ask Alexa to play a video e.g. play the 83 world cup movie, play something on Netflix, play the Seahawks game or tune to oscars etc., it will be my teams behind indexing, searching and ranking to select "the" video entity you are interested to watch. Similarly, when you are in an ambient mode on these devices and see the latest season of Marvelous Mrs. Maisel, a continue watching carousal or content similar to what you have watched pop-up on your screen, its highly likely that they are developed by my team. We are invested in developing (1) real time indexing solutions to several hundred million entities (2) state of the art deep learning based information retrieval and personalization ranking solutions and (3) low latency and highly available ML services that powers millions concurrent users. | Citrix | Lead Development Engineer | Developed Web Publishing Platform for citrix.com | Yahoo | Technical Lead- User Reputation | Developed platform to compute global and category wise user reputation scores. These scores were used as signals for content personalization, comment ranking , abuse detection and customer care | Amazon | Software Development Manager- Alexa Info',
'Esri | Product Engineering Intern | | Georgia Institute of Technology | Student | | ServiceNow | Application Developer | • Full-stack development of a productivity application extension for team schedule management. Intended for production.<br> • Designed database schema structure to efficiently handle concurrent operations for hundreds of team members<br> • REST API development in JavaScript using ServiceNow internal tooling to support application operations<br> • Front-end development using internal codeless platform as well as ServiceNow internal tool (SEISMIC/Tectonic) similar to React. <br> • Also built internal tool for finding Zoom meeting timestamps with transcripts relevant to user’s search term. | Meta | Software Engineer | Working across the stack in ads and ad delivery <br><br>- Native calling for lead generation ad products<br>- Machine learning methods for related ads <!----> | Stealth | Software Engineer | Making an AI assistant for friend groups <!----> | Georgia Tech college of computing | Teaching Assistant | As a teaching assistant for CS3510, GT's algorithm design and analysis course, I:<br>- Hold weekly office hours for students who want to better explore and understand algorithm design and analysis<br>- Host and answer discussions online pertaining to class material<br>- Grade homework and tests for 200 students <!----> | Georgia Institute of Technology | Teaching Assistant | | Retool | Software Engineer | AI agents <!----> | SWE + AI + ML. Retool, Meta | "Once you know that you can work with purpose, it becomes hard to work without it." | | | | ',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8998, 0.7761],
# [0.8998, 1.0000, 0.8581],
# [0.7761, 0.8581, 1.0000]])
sentence1, sentence2, and score| sentence1 | sentence2 | score | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence1 | sentence2 | score |
|---|---|---|
Instruct: Given a web search query, retrieve relevant passages that answer the query. |
Berkeley Unified School DIsrict | Educator | | WCCUSD | Educator | | Educator at WCCUSD | | | | | |
0.0 |
Instruct: Given a web search query, retrieve relevant passages that answer the query. |
Vkan Tech Solutions |
Software Developer |
Instruct: Given a web search query, retrieve relevant passages that answer the query. |
Uber |
Senior Staff Software Engineer, TLM |
CosineSimilarityLoss with these parameters:{
"loss_fct": "torch.nn.modules.loss.MSELoss",
"cos_score_transformation": "torch.nn.modules.linear.Identity"
}
sentence1, sentence2, and score| sentence1 | sentence2 | score | |
|---|---|---|---|
| type | string | string | float |
| details |
|
|
|
| sentence1 | sentence2 | score |
|---|---|---|
Instruct: Given a web search query, retrieve relevant passages that answer the query. |
Direct Current Co., Ltd. |
Operating Department Intern |
Instruct: Given a web search query, retrieve relevant passages that answer the query. |
Roots Industries India | Data Science Intern | During my internship at Roots Industries India Private Limited, I developed a robust forecasting model to predict product sales quantities for future years using Python and a dataset containing over 3 lakh records of sales data. Under my mentor's guidance, I implemented an ARIMA model for time series forecasting, leveraging its effectiveness in capturing trends and seasonality. When the ARIMA model faced performance challenges, I integrated exponential smoothing to enhance predictive accuracy. | Student at Amrita Vishwa Vidyapeetham | | | | 5 days workshop on Cricket Analytics|Introduction to Data Analysis using Microsoft Excel | |
0.5 |
Instruct: Given a web search query, retrieve relevant passages that answer the query. |
TotalEnergies | Console Operator | Alkylation/Cogeneration/Process Water Treatment Center | Console Operator @ TotalEnergies | Industrial Technology | Experienced Console Operator; seeking a Supervisor Role. |
0.0 |
CosineSimilarityLoss with these parameters:{
"loss_fct": "torch.nn.modules.loss.MSELoss",
"cos_score_transformation": "torch.nn.modules.linear.Identity"
}
eval_strategy: stepsper_device_train_batch_size: 16per_device_eval_batch_size: 16gradient_accumulation_steps: 4learning_rate: 2e-05warmup_ratio: 0.1bf16: Trueload_best_model_at_end: Truegradient_checkpointing: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: stepsprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 4eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 3max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falsebf16: Truefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Truegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | Validation Loss |
|---|---|---|---|
| 0.0736 | 25 | 0.0686 | - |
| 0.1473 | 50 | 0.0467 | - |
| 0.2209 | 75 | 0.0371 | - |
| 0.2946 | 100 | 0.0332 | 0.0444 |
| 0.3682 | 125 | 0.0346 | - |
| 0.4418 | 150 | 0.0327 | - |
| 0.5155 | 175 | 0.031 | - |
| 0.5891 | 200 | 0.0304 | 0.0376 |
| 0.6627 | 225 | 0.0303 | - |
| 0.7364 | 250 | 0.0323 | - |
| 0.8100 | 275 | 0.0305 | - |
| 0.8837 | 300 | 0.0289 | 0.0371 |
| 0.9573 | 325 | 0.0302 | - |
| 1.0295 | 350 | 0.0283 | - |
| 1.1031 | 375 | 0.024 | - |
| 1.1767 | 400 | 0.0215 | 0.0391 |
| 1.2504 | 425 | 0.0214 | - |
| 1.3240 | 450 | 0.0226 | - |
| 1.3976 | 475 | 0.0227 | - |
| 1.4713 | 500 | 0.022 | 0.0392 |
| 1.5449 | 525 | 0.024 | - |
| 1.6186 | 550 | 0.0224 | - |
| 1.6922 | 575 | 0.0228 | - |
| 1.7658 | 600 | 0.0225 | 0.035 |
| 1.8395 | 625 | 0.0226 | - |
| 1.9131 | 650 | 0.021 | - |
| 1.9867 | 675 | 0.0206 | - |
| 2.0589 | 700 | 0.0188 | 0.0365 |
| 2.1325 | 725 | 0.0167 | - |
| 2.2062 | 750 | 0.0153 | - |
| 2.2798 | 775 | 0.0173 | - |
| 2.3535 | 800 | 0.0157 | 0.0358 |
| 2.4271 | 825 | 0.016 | - |
| 2.5007 | 850 | 0.0157 | - |
| 2.5744 | 875 | 0.0159 | - |
| 2.6480 | 900 | 0.016 | 0.0369 |
| 2.7216 | 925 | 0.0159 | - |
| 2.7953 | 950 | 0.0158 | - |
| 2.8689 | 975 | 0.0161 | - |
| 2.9426 | 1000 | 0.0163 | 0.0360 |
| 3.0 | 1020 | - | 0.0350 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
Base model
NovaSearch/stella_en_400M_v5