--- language: - en license: apache-2.0 tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:6300 - loss:MatryoshkaLoss - loss:MultipleNegativesRankingLoss base_model: BAAI/bge-base-en widget: - source_sentence: 'Employee health, safety and wellness are top priorities at Hasbro. We support our colleagues’ well-being, which includes mental, physical and financial wellness, through a number of programs, including: robust employee assistance programs, childcare solutions, and a commitment to flexible work arrangements.' sentences: - What percentage of the total annual net trade sales did the sales returns reserve represent for the company during each of the fiscal years 2023, 2022, and 2021? - How does Hasbro support the wellness of its employees? - What was the conclusion of the Company's review regarding the impact of the American Rescue Plan, the Consolidated Appropriations Act, 2021, and related tax provisions on its business for the fiscal year ended June 30, 2023? - source_sentence: The Company has a minority market share in the global smartphone, personal computer and tablet markets. The Company faces substantial competition in these markets from companies that have significant technical, marketing, distribution and other resources, as well as established hardware, software and digital content supplier relationships. In addition, some of the Company’s competitors have broader product lines, lower-priced products and a larger installed base of active devices. Competition has been particularly intense as competitors have aggressively cut prices and lowered product margins. sentences: - When did The Hershey Company declare the dividend that was paid on March 15, 2023? - What factors contribute to the Company facing substantial competition in the markets for smartphones, personal computers, and tablets? - How is goodwill impairment analyzed? - source_sentence: During fiscal 2022, there were cash payments of $6.7 billion for repurchases of common stock through open market purchases. sentences: - What was the value of cash payments for common stock repurchases through open market purchases during fiscal 2022? - How much did the Compute & Networking segment's gross margin decrease in fiscal year 2023? - What different methods does Amazon use to engage and retain employees? - source_sentence: Walmart Luminate provides a suite of data products for merchants and suppliers. sentences: - What pages do the Consolidated Financial Statements and their accompanying Notes and reports appear on in the document? - What was the percentage change in NYSE total cash handled volume from 2022 to 2023? - What is the function of Walmart Luminate? - source_sentence: Item 8. Financial Statements and Supplementary Data. The Consolidated Financial Statements, together with the Notes thereto and the report thereon dated February 16, 2024, of PricewaterhouseCoopers LLP, the Firm’s independent registered public accounting firm (PCAOB ID 238). sentences: - What type of data does Item 8 in a financial document contain? - How did the assumptions and estimates used for assessing the fair value of reporting units potentially impact the company's financial statements? - What factors are considered when making estimates for financial statements? pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - cosine_accuracy@1 - cosine_accuracy@3 - cosine_accuracy@5 - cosine_accuracy@10 - cosine_precision@1 - cosine_precision@3 - cosine_precision@5 - cosine_precision@10 - cosine_recall@1 - cosine_recall@3 - cosine_recall@5 - cosine_recall@10 - cosine_ndcg@10 - cosine_mrr@10 - cosine_map@100 model-index: - name: BGE base Financial Matryoshka results: - task: type: information-retrieval name: Information Retrieval dataset: name: dim 768 type: dim_768 metrics: - type: cosine_accuracy@1 value: 0.20411392405063292 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.39082278481012656 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.45569620253164556 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.5427215189873418 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.20411392405063292 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.1302742616033755 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.0911392405063291 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.054272151898734175 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.20411392405063292 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.39082278481012656 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.45569620253164556 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.5427215189873418 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.3712962481916349 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.31667482921438606 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.32569334518419213 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 512 type: dim_512 metrics: - type: cosine_accuracy@1 value: 0.1787974683544304 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.38449367088607594 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.44936708860759494 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.5221518987341772 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.1787974683544304 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.1281645569620253 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.08987341772151898 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.05221518987341772 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.1787974683544304 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.38449367088607594 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.44936708860759494 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.5221518987341772 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.35214780800723905 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.2974972372915411 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.30719274754259535 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 256 type: dim_256 metrics: - type: cosine_accuracy@1 value: 0.17563291139240506 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.33860759493670883 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.3924050632911392 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.49683544303797467 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.17563291139240506 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.11286919831223628 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.07848101265822786 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.04968354430379747 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.17563291139240506 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.33860759493670883 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.3924050632911392 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.49683544303797467 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.32777016757909155 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.2748675155716295 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.2839854758498125 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 128 type: dim_128 metrics: - type: cosine_accuracy@1 value: 0.13449367088607594 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.27689873417721517 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.34335443037974683 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.40189873417721517 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.13449367088607594 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.09229957805907173 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.06867088607594937 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.04018987341772152 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.13449367088607594 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.27689873417721517 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.34335443037974683 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.40189873417721517 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.2642535058721437 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.2206462226240707 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.2315340997045677 name: Cosine Map@100 - task: type: information-retrieval name: Information Retrieval dataset: name: dim 64 type: dim_64 metrics: - type: cosine_accuracy@1 value: 0.08544303797468354 name: Cosine Accuracy@1 - type: cosine_accuracy@3 value: 0.19462025316455697 name: Cosine Accuracy@3 - type: cosine_accuracy@5 value: 0.24841772151898733 name: Cosine Accuracy@5 - type: cosine_accuracy@10 value: 0.31645569620253167 name: Cosine Accuracy@10 - type: cosine_precision@1 value: 0.08544303797468354 name: Cosine Precision@1 - type: cosine_precision@3 value: 0.06487341772151899 name: Cosine Precision@3 - type: cosine_precision@5 value: 0.04968354430379747 name: Cosine Precision@5 - type: cosine_precision@10 value: 0.031645569620253174 name: Cosine Precision@10 - type: cosine_recall@1 value: 0.08544303797468354 name: Cosine Recall@1 - type: cosine_recall@3 value: 0.19462025316455697 name: Cosine Recall@3 - type: cosine_recall@5 value: 0.24841772151898733 name: Cosine Recall@5 - type: cosine_recall@10 value: 0.31645569620253167 name: Cosine Recall@10 - type: cosine_ndcg@10 value: 0.19364593797751115 name: Cosine Ndcg@10 - type: cosine_mrr@10 value: 0.15531381856540089 name: Cosine Mrr@10 - type: cosine_map@100 value: 0.16408720453627956 name: Cosine Map@100 --- # BGE base Financial Matryoshka This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [BAAI/bge-base-en](https://huggingface.co/BAAI/bge-base-en) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 dimensions - **Similarity Function:** Cosine Similarity - **Training Dataset:** - json - **Language:** en - **License:** apache-2.0 ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("RK-1235/bge-base-FIR-matryoshka-BASELINE-10epochs-FT") # Run inference sentences = [ 'Item 8. Financial Statements and Supplementary Data. The Consolidated Financial Statements, together with the Notes thereto and the report thereon dated February 16, 2024, of PricewaterhouseCoopers LLP, the Firm’s independent registered public accounting firm (PCAOB ID 238).', 'What type of data does Item 8 in a financial document contain?', "How did the assumptions and estimates used for assessing the fair value of reporting units potentially impact the company's financial statements?", ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Information Retrieval * Dataset: `dim_768` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 768 } ``` | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.2041 | | cosine_accuracy@3 | 0.3908 | | cosine_accuracy@5 | 0.4557 | | cosine_accuracy@10 | 0.5427 | | cosine_precision@1 | 0.2041 | | cosine_precision@3 | 0.1303 | | cosine_precision@5 | 0.0911 | | cosine_precision@10 | 0.0543 | | cosine_recall@1 | 0.2041 | | cosine_recall@3 | 0.3908 | | cosine_recall@5 | 0.4557 | | cosine_recall@10 | 0.5427 | | **cosine_ndcg@10** | **0.3713** | | cosine_mrr@10 | 0.3167 | | cosine_map@100 | 0.3257 | #### Information Retrieval * Dataset: `dim_512` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 512 } ``` | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.1788 | | cosine_accuracy@3 | 0.3845 | | cosine_accuracy@5 | 0.4494 | | cosine_accuracy@10 | 0.5222 | | cosine_precision@1 | 0.1788 | | cosine_precision@3 | 0.1282 | | cosine_precision@5 | 0.0899 | | cosine_precision@10 | 0.0522 | | cosine_recall@1 | 0.1788 | | cosine_recall@3 | 0.3845 | | cosine_recall@5 | 0.4494 | | cosine_recall@10 | 0.5222 | | **cosine_ndcg@10** | **0.3521** | | cosine_mrr@10 | 0.2975 | | cosine_map@100 | 0.3072 | #### Information Retrieval * Dataset: `dim_256` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 256 } ``` | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.1756 | | cosine_accuracy@3 | 0.3386 | | cosine_accuracy@5 | 0.3924 | | cosine_accuracy@10 | 0.4968 | | cosine_precision@1 | 0.1756 | | cosine_precision@3 | 0.1129 | | cosine_precision@5 | 0.0785 | | cosine_precision@10 | 0.0497 | | cosine_recall@1 | 0.1756 | | cosine_recall@3 | 0.3386 | | cosine_recall@5 | 0.3924 | | cosine_recall@10 | 0.4968 | | **cosine_ndcg@10** | **0.3278** | | cosine_mrr@10 | 0.2749 | | cosine_map@100 | 0.284 | #### Information Retrieval * Dataset: `dim_128` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 128 } ``` | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.1345 | | cosine_accuracy@3 | 0.2769 | | cosine_accuracy@5 | 0.3434 | | cosine_accuracy@10 | 0.4019 | | cosine_precision@1 | 0.1345 | | cosine_precision@3 | 0.0923 | | cosine_precision@5 | 0.0687 | | cosine_precision@10 | 0.0402 | | cosine_recall@1 | 0.1345 | | cosine_recall@3 | 0.2769 | | cosine_recall@5 | 0.3434 | | cosine_recall@10 | 0.4019 | | **cosine_ndcg@10** | **0.2643** | | cosine_mrr@10 | 0.2206 | | cosine_map@100 | 0.2315 | #### Information Retrieval * Dataset: `dim_64` * Evaluated with [InformationRetrievalEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) with these parameters: ```json { "truncate_dim": 64 } ``` | Metric | Value | |:--------------------|:-----------| | cosine_accuracy@1 | 0.0854 | | cosine_accuracy@3 | 0.1946 | | cosine_accuracy@5 | 0.2484 | | cosine_accuracy@10 | 0.3165 | | cosine_precision@1 | 0.0854 | | cosine_precision@3 | 0.0649 | | cosine_precision@5 | 0.0497 | | cosine_precision@10 | 0.0316 | | cosine_recall@1 | 0.0854 | | cosine_recall@3 | 0.1946 | | cosine_recall@5 | 0.2484 | | cosine_recall@10 | 0.3165 | | **cosine_ndcg@10** | **0.1936** | | cosine_mrr@10 | 0.1553 | | cosine_map@100 | 0.1641 | ## Training Details ### Training Dataset #### json * Dataset: json * Size: 6,300 training samples * Columns: positive and anchor * Approximate statistics based on the first 1000 samples: | | positive | anchor | |:--------|:-----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | positive | anchor | |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------| | As of December 31, 2023, a 5 percent change in the contingent consideration liabilities would result in a change in income before income taxes of $5.2 million. | How would a 5% change in the contingent consideration liabilities impact income before taxes as of December 31, 2023? | | NIKE, Inc.'s principal business activity involves the design, development, and worldwide marketing and selling of athletic footwear, apparel, equipment, accessories, and services. | What is the principal business activity of NIKE, Inc.? | | During 2023, changes in foreign currencies relative to the U.S. dollar negatively impacted net sales by approximately $3,484, 156 basis points, compared to 2022, attributable to our Canadian and Other International operations. | What was the overall impact of foreign currencies on net sales in 2023? | * Loss: [MatryoshkaLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: ```json { "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: epoch - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 16 - `gradient_accumulation_steps`: 16 - `learning_rate`: 2e-05 - `num_train_epochs`: 10 - `lr_scheduler_type`: cosine - `warmup_ratio`: 0.1 - `bf16`: True - `tf32`: True - `load_best_model_at_end`: True - `optim`: adamw_torch_fused - `batch_sampler`: no_duplicates #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: epoch - `prediction_loss_only`: True - `per_device_train_batch_size`: 32 - `per_device_eval_batch_size`: 16 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 16 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 2e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 10 - `max_steps`: -1 - `lr_scheduler_type`: cosine - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: True - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: True - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch_fused - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `eval_use_gather_object`: False - `average_tokens_across_devices`: False - `prompts`: None - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 | |:-------:|:------:|:-------------:|:----------------------:|:----------------------:|:----------------------:|:----------------------:|:---------------------:| | 0.8122 | 10 | 89.0763 | - | - | - | - | - | | **1.0** | **13** | **-** | **0.4022** | **0.3835** | **0.3505** | **0.2911** | **0.1835** | | 1.5685 | 20 | 36.7538 | - | - | - | - | - | | 2.0 | 26 | - | 0.3725 | 0.3591 | 0.3218 | 0.2753 | 0.1978 | | 2.3249 | 30 | 17.7869 | - | - | - | - | - | | 3.0 | 39 | - | 0.3680 | 0.3558 | 0.3284 | 0.2638 | 0.2000 | | 3.0812 | 40 | 10.5904 | - | - | - | - | - | | 3.8934 | 50 | 7.9568 | - | - | - | - | - | | 4.0 | 52 | - | 0.3634 | 0.3487 | 0.3245 | 0.2589 | 0.1999 | | 4.6497 | 60 | 5.5002 | - | - | - | - | - | | 5.0 | 65 | - | 0.3648 | 0.3551 | 0.3211 | 0.2595 | 0.1968 | | 5.4061 | 70 | 5.3314 | - | - | - | - | - | | 6.0 | 78 | - | 0.3693 | 0.3548 | 0.3257 | 0.2621 | 0.1977 | | 6.1624 | 80 | 4.6165 | - | - | - | - | - | | 6.9746 | 90 | 4.7811 | - | - | - | - | - | | 7.0 | 91 | - | 0.3698 | 0.3532 | 0.3293 | 0.2637 | 0.1954 | | 7.7310 | 100 | 3.978 | - | - | - | - | - | | 8.0 | 104 | - | 0.3713 | 0.3523 | 0.3273 | 0.2637 | 0.1952 | | 8.4873 | 110 | 4.1624 | - | - | - | - | - | | 9.0 | 117 | - | 0.3707 | 0.3517 | 0.3264 | 0.2639 | 0.1949 | | 9.2437 | 120 | 3.4956 | - | - | - | - | - | | 10.0 | 130 | 3.9661 | 0.3713 | 0.3521 | 0.3278 | 0.2643 | 0.1936 | * The bold row denotes the saved checkpoint. ### Framework Versions - Python: 3.10.12 - Sentence Transformers: 4.1.0 - Transformers: 4.52.2 - PyTorch: 2.6.0+cu124 - Accelerate: 1.7.0 - Datasets: 3.6.0 - Tokenizers: 0.21.1 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MatryoshkaLoss ```bibtex @misc{kusupati2024matryoshka, title={Matryoshka Representation Learning}, author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, year={2024}, eprint={2205.13147}, archivePrefix={arXiv}, primaryClass={cs.LG} } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```