SentenceTransformer based on NovaSearch/stella_en_400M_v5

This is a sentence-transformers model finetuned from NovaSearch/stella_en_400M_v5. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: NovaSearch/stella_en_400M_v5
Maximum Sequence Length: 512 tokens
Output Dimensionality: 1024 dimensions
Similarity Function: Cosine Similarity
Supported Modality: Text

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'NewModel'})
  (1): Pooling({'embedding_dimension': 1024, 'pooling_mode': 'mean', 'include_prompt': True})
  (2): Dense({'in_features': 1024, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity', 'module_input_name': 'sentence_embedding', 'module_output_name': 'sentence_embedding'})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Gem-Software/stella-en-400m-v5-gem-v5-hyde")
# Run inference
sentences = [
    'Instruct: Given a web search query, retrieve relevant passages that answer the query.\nQuery: Senior Engineering Manager at Airbnb (2020-Present), Engineering Manager at Uber (2016-2020), Software Engineer at Amazon (2013-2016) | BASc Computer Engineering, University of Waterloo | Search Infrastructure, Machine Learning, Distributed Systems, Elasticsearch, Team Leadership, Ranking Algorithms, A/B Testing, Personalization, Roadmap Planning | Engineering leader with 11 years building search and discovery products at marketplace scale, currently leading a team of 20 engineers at Airbnb focused on search infrastructure and personalized ranking systems.',
    'Yahoo | Technical Lead - Social Patform | Worked with Yahoo Lab to develop identity mapping algorithm to unify various social graphs like Yahoo, Facebook, Twitter and Flickr to create a unified view of a person on internet.Technologies: Hadoop,HBase, Pig | Yahoo | Principal Engineer - Conversational Assistant | Developed interactive Natural Language Understanding Platform for chatbots. Platform provides Intent classification, Entity Detection, Dialog understanding, Slot filling, Domain detection etcWave of Yahoo Bots on Kik and Facebook heavily relied on this platform for online active learning.Technologies: Spark ML, Weka, Stanford NLP, CRF++ | Amazon | Senior Engineering Manager - Alexa Video | I help build personalized voice search and discovery experience on devices like Fire TV and Echo Show. When you ask Alexa to play a video e.g. play the 83 world cup movie, play something on Netflix, play the Seahawks game or tune to oscars etc., it will be my teams behind indexing, searching and ranking to select "the" video entity you are interested to watch. Similarly, when you are in an ambient mode on these devices and see the latest season of Marvelous Mrs. Maisel, a continue watching carousal or content similar to what you have watched pop-up on your screen, its highly likely that they are developed by my team. We are invested in developing (1) real time indexing solutions to several hundred million entities (2) state of the art deep learning based information retrieval and personalization ranking solutions and (3) low latency and highly available ML services that powers millions concurrent users. | Citrix | Lead Development Engineer | Developed Web Publishing Platform for citrix.com | Yahoo | Technical Lead- User Reputation | Developed platform to compute global and category wise user reputation scores. These scores were used as signals for content personalization, comment ranking , abuse detection and customer care | Amazon | Software Development Manager- Alexa Info',
    'Esri | Product Engineering Intern |  | Georgia Institute of Technology | Student |  | ServiceNow | Application Developer | &#x2022; Full-stack development of a productivity application extension for team schedule management. Intended for production.<br> &#x2022; Designed database schema structure to efficiently handle concurrent operations for hundreds of team members<br> &#x2022; REST API development in JavaScript using ServiceNow internal tooling to support application operations<br> &#x2022; Front-end development using internal codeless platform as well as ServiceNow internal tool (SEISMIC/Tectonic) similar to React. <br> &#x2022; Also built internal tool for finding Zoom meeting timestamps with transcripts relevant to user&#x2019;s search term. | Meta | Software Engineer | Working across the stack in ads and ad delivery <br><br>- Native calling for lead generation ad products<br>- Machine learning methods for related ads <!----> | Stealth | Software Engineer | Making an AI assistant for friend groups <!----> | Georgia Tech college of computing | Teaching Assistant | As a teaching assistant for CS3510, GT&apos;s algorithm design and analysis course, I:<br>- Hold weekly office hours for students who want to better explore and understand algorithm design and analysis<br>- Host and answer discussions online pertaining to class material<br>- Grade homework and tests for 200 students <!----> | Georgia Institute of Technology | Teaching Assistant |  | Retool | Software Engineer | AI agents <!----> | SWE + AI + ML. Retool, Meta | "Once you know that you can work with purpose, it becomes hard to work without it." |  |  |  | ',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8998, 0.7761],
#         [0.8998, 1.0000, 0.8581],
#         [0.7761, 0.8581, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

Size: 21,727 training samples
Columns: sentence1, sentence2, and score
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score
type string string float
details
min: 108 tokens
mean: 278.91 tokens
max: 379 tokens

min: 18 tokens
mean: 314.68 tokens
max: 512 tokens

min: 0.0
mean: 0.52
max: 1.0

	sentence1	sentence2	score
type	string	string	float
details	min: 108 tokens mean: 278.91 tokens max: 379 tokens	min: 18 tokens mean: 314.68 tokens max: 512 tokens	min: 0.0 mean: 0.52 max: 1.0

Samples:

sentence1	sentence2	score
`Instruct: Given a web search query, retrieve relevant passages that answer the query. Query: Staff Software Engineer at Airbnb (2019-Present, 5 years), Senior Software Engineer at Uber (2015-2019, 4 years), Software Engineer at Square (2011-2015, 4 years) \| BS Computer Science, UC Berkeley \| React, Ruby on Rails, Go, PostgreSQL, Distributed Systems, Marketplace Dynamics, Full Stack Development, System Design, Agile Leadership \| Staff-level Full Stack Engineer with`	`Berkeley Unified School DIsrict \| Educator \| \| WCCUSD \| Educator \| \| Educator at WCCUSD \| \| \| \| \|`	`0.0`
`Instruct: Given a web search query, retrieve relevant passages that answer the query. Query: Staff Software Engineer at Airbnb (2019-Present, 5 years), Senior Software Engineer at Uber (2015-2019, 4 years), Software Engineer at Square (2011-2015, 4 years) \| BS Computer Science, UC Berkeley \| React, Ruby on Rails, Go, PostgreSQL, Distributed Systems, Marketplace Dynamics, Full Stack Development, System Design, Agile Leadership \| Staff-level Full Stack Engineer with`	`Vkan Tech Solutions`	Software Developer
`Instruct: Given a web search query, retrieve relevant passages that answer the query. Query: Staff Software Engineer at Airbnb (2019-Present, 5 years), Senior Software Engineer at Uber (2015-2019, 4 years), Software Engineer at Square (2011-2015, 4 years) \| BS Computer Science, UC Berkeley \| React, Ruby on Rails, Go, PostgreSQL, Distributed Systems, Marketplace Dynamics, Full Stack Development, System Design, Agile Leadership \| Staff-level Full Stack Engineer with`	`Uber`	Senior Staff Software Engineer, TLM

Loss: CosineSimilarityLoss with these parameters:

{
    "loss_fct": "torch.nn.modules.loss.MSELoss",
    "cos_score_transformation": "torch.nn.modules.linear.Identity"
}

Evaluation Dataset

Unnamed Dataset

Size: 8,717 evaluation samples
Columns: sentence1, sentence2, and score
Approximate statistics based on the first 1000 samples:
sentence1 sentence2 score
type string string float
details
min: 130 tokens
mean: 185.15 tokens
max: 323 tokens

min: 15 tokens
mean: 313.63 tokens
max: 512 tokens

min: 0.0
mean: 0.59
max: 1.0

	sentence1	sentence2	score
type	string	string	float
details	min: 130 tokens mean: 185.15 tokens max: 323 tokens	min: 15 tokens mean: 313.63 tokens max: 512 tokens	min: 0.0 mean: 0.59 max: 1.0

Samples:

sentence1	sentence2	score
Instruct: Given a web search query, retrieve relevant passages that answer the query. Query: Senior Data Scientist, Supply Chain Analytics at Wayfair (2021-Present), Data Scientist at PepsiCo (2018-2021), Data Analyst at Target (2016-2018) \| MS Data Science, Northeastern University; BS Statistics, University of Massachusetts Amherst \| Time Series Forecasting, ARIMA, Prophet, LSTMs, Transformers, Python, SQL, S&OP Planning, Demand Planning, Inventory Optimization, Production ML Systems, AWS \| Data scientist specializing in demand forecasting and S&OP planning with 7+ years of experience building and deploying production-grade forecasting models that drive strategic supply chain decisions and optimize inventory management.	`Direct Current Co., Ltd.`	Operating Department Intern
Instruct: Given a web search query, retrieve relevant passages that answer the query. Query: Senior Data Scientist, Supply Chain Analytics at Wayfair (2021-Present), Data Scientist at PepsiCo (2018-2021), Data Analyst at Target (2016-2018) \| MS Data Science, Northeastern University; BS Statistics, University of Massachusetts Amherst \| Time Series Forecasting, ARIMA, Prophet, LSTMs, Transformers, Python, SQL, S&OP Planning, Demand Planning, Inventory Optimization, Production ML Systems, AWS \| Data scientist specializing in demand forecasting and S&OP planning with 7+ years of experience building and deploying production-grade forecasting models that drive strategic supply chain decisions and optimize inventory management.	Roots Industries India \| Data Science Intern \| During my internship at Roots Industries India Private Limited, I developed a robust forecasting model to predict product sales quantities for future years using Python and a dataset containing over 3 lakh records of sales data. Under my mentor's guidance, I implemented an ARIMA model for time series forecasting, leveraging its effectiveness in capturing trends and seasonality. When the ARIMA model faced performance challenges, I integrated exponential smoothing to enhance predictive accuracy. \| Student at Amrita Vishwa Vidyapeetham \| \| \| \| 5 days workshop on Cricket Analytics\|Introduction to Data Analysis using Microsoft Excel \|	`0.5`
Instruct: Given a web search query, retrieve relevant passages that answer the query. Query: Senior Data Scientist, Supply Chain Analytics at Wayfair (2021-Present), Data Scientist at PepsiCo (2018-2021), Data Analyst at Target (2016-2018) \| MS Data Science, Northeastern University; BS Statistics, University of Massachusetts Amherst \| Time Series Forecasting, ARIMA, Prophet, LSTMs, Transformers, Python, SQL, S&OP Planning, Demand Planning, Inventory Optimization, Production ML Systems, AWS \| Data scientist specializing in demand forecasting and S&OP planning with 7+ years of experience building and deploying production-grade forecasting models that drive strategic supply chain decisions and optimize inventory management.	TotalEnergies \| Console Operator \| Alkylation/Cogeneration/Process Water Treatment Center \| Console Operator @ TotalEnergies \| Industrial Technology \| Experienced Console Operator; seeking a Supervisor Role. AAS Instrumentation Technology Degree (2016) BS Industrial Technology Degree (2025) MS Engineering Management (expected 2027) 8 yrs of Refinery Experience ~ 3 yrs Console Operator at TotalEnergies ~ 3 yrs Process Operator at TotalEnergies ~ 2 yrs packaging operator at Lion Elastomers \| Troubleshooting\|Sales\|Refinery\|Team Building\|Customer Service\|Social Media\|Strategic Planning\|Maintenance Management\|Petroleum\|Engineering\|Calibration\|Inspection\|Electricians\|Commissioning\|Electronics\|Maintenance\|Microsoft Office\|Microsoft Word\|Veterans\|Leadership\|Maintenance & Repair \| \| \|	`0.0`

Loss: CosineSimilarityLoss with these parameters:

{
    "loss_fct": "torch.nn.modules.loss.MSELoss",
    "cos_score_transformation": "torch.nn.modules.linear.Identity"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
gradient_accumulation_steps: 4
learning_rate: 2e-05
warmup_ratio: 0.1
bf16: True
load_best_model_at_end: True
gradient_checkpointing: True
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 4
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 2e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 3
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: True
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Epoch	Step	Training Loss	Validation Loss
0.0736	25	0.0686	-
0.1473	50	0.0467	-
0.2209	75	0.0371	-
0.2946	100	0.0332	0.0444
0.3682	125	0.0346	-
0.4418	150	0.0327	-
0.5155	175	0.031	-
0.5891	200	0.0304	0.0376
0.6627	225	0.0303	-
0.7364	250	0.0323	-
0.8100	275	0.0305	-
0.8837	300	0.0289	0.0371
0.9573	325	0.0302	-
1.0295	350	0.0283	-
1.1031	375	0.024	-
1.1767	400	0.0215	0.0391
1.2504	425	0.0214	-
1.3240	450	0.0226	-
1.3976	475	0.0227	-
1.4713	500	0.022	0.0392
1.5449	525	0.024	-
1.6186	550	0.0224	-
1.6922	575	0.0228	-
1.7658	600	0.0225	0.035
1.8395	625	0.0226	-
1.9131	650	0.021	-
1.9867	675	0.0206	-
2.0589	700	0.0188	0.0365
2.1325	725	0.0167	-
2.2062	750	0.0153	-
2.2798	775	0.0173	-
2.3535	800	0.0157	0.0358
2.4271	825	0.016	-
2.5007	850	0.0157	-
2.5744	875	0.0159	-
2.6480	900	0.016	0.0369
2.7216	925	0.0159	-
2.7953	950	0.0158	-
2.8689	975	0.0161	-
2.9426	1000	0.0163	0.0360
3.0	1020	-	0.0350

The bold row denotes the saved checkpoint.

Training Time

Training: 49.7 minutes
Evaluation: 18.0 minutes
Total: 1.1 hours

Framework Versions

Python: 3.11.10
Sentence Transformers: 5.4.0
Transformers: 4.57.6
PyTorch: 2.5.1+cu124
Accelerate: 1.13.0
Datasets: 4.8.4
Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

Downloads last month: 38

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for Gem-Software/stella-en-400m-v5-gem-v5-hyde

Base model

NovaSearch/stella_en_400M_v5

Finetuned

(19)

this model

Paper for Gem-Software/stella-en-400m-v5-gem-v5-hyde

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 12