Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 12
This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base on the legal-rag-positives-synthetic dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("aaa961/modernbert-embed-base-legal-no_MRL_reverse_dataset")
# Run inference
sentences = [
'confidentiality agreement/order, that remain following those discussions. This is a \nfinal report and notice of exceptions shall be filed within three days of the date of \nthis report, pursuant to Court of Chancery Rule 144(d)(2), given the expedited and \nsummary nature of Section 220 proceedings. \n \n \n \n \n \n \n \nRespectfully, \n \n \n \n \n \n \n \n \n/s/ Patricia W. Griffin',
'According to which court rule must the notice of exceptions be filed?',
'decides whether to submit proposals on future procurements, and excluding mentor-protégé JVs \nfrom proposing on a solicitation due to Section 125.9(b)(3)(i) unnecessarily prevents protégés from \naccessing opportunities to grow as a business. SHS MJAR at 22–23; VCH MJAR at 22–23. \nSuch a critique, however, merely highlights Plaintiffs’ disagreement with the SBA’s',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4959, 0.0333],
# [0.4959, 1.0000, 0.0378],
# [0.0333, 0.0378, 1.0000]])
ir_evalInformationRetrievalEvaluator| Metric | Value |
|---|---|
| cosine_accuracy@1 | 0.609 |
| cosine_accuracy@3 | 0.7635 |
| cosine_accuracy@5 | 0.8223 |
| cosine_accuracy@10 | 0.8825 |
| cosine_precision@1 | 0.609 |
| cosine_precision@3 | 0.2545 |
| cosine_precision@5 | 0.1645 |
| cosine_precision@10 | 0.0883 |
| cosine_recall@1 | 0.609 |
| cosine_recall@3 | 0.7635 |
| cosine_recall@5 | 0.8223 |
| cosine_recall@10 | 0.8825 |
| cosine_ndcg@10 | 0.7413 |
| cosine_mrr@10 | 0.6965 |
| cosine_map@100 | 0.701 |
anchor and positive| anchor | positive | |
|---|---|---|
| type | string | string |
| details |
|
|
| anchor | positive |
|---|---|
What kinds of issues are mentioned in connection with wrongdoing? |
mismanagement, waste and wrongdoing – and that it has demonstrated more than a |
Project, 504 F.2d at 248 n.15). |
What page reference is given for the Lombardo v. Handler case in the aforementioned citation? |
Where can more detailed information regarding redactions be found? |
parties specifically with respect to the FOIA request at issue in Count Eighteen of No. 11-444. This is likely |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim",
"gather_across_devices": false,
"directions": [
"query_to_doc"
],
"partition_mode": "joint",
"hardness_mode": null,
"hardness_strength": 0.0
}
per_device_train_batch_size: 32num_train_epochs: 4learning_rate: 2e-05lr_scheduler_type: cosinewarmup_steps: 0.1optim: adamw_torch_fusedgradient_accumulation_steps: 16bf16: Truetf32: Trueeval_strategy: epochper_device_eval_batch_size: 16load_best_model_at_end: Trueper_device_train_batch_size: 32num_train_epochs: 4max_steps: -1learning_rate: 2e-05lr_scheduler_type: cosinelr_scheduler_kwargs: Nonewarmup_steps: 0.1optim: adamw_torch_fusedoptim_args: Noneweight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08optim_target_modules: Nonegradient_accumulation_steps: 16average_tokens_across_devices: Truemax_grad_norm: 1.0label_smoothing_factor: 0.0bf16: Truefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Truegradient_checkpointing: Falsegradient_checkpointing_kwargs: Nonetorch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneuse_liger_kernel: Falseliger_kernel_config: Noneuse_cache: Falseneftune_noise_alpha: Nonetorch_empty_cache_steps: Noneauto_find_batch_size: Falselog_on_each_node: Truelogging_nan_inf_filter: Trueinclude_num_input_tokens_seen: nolog_level: passivelog_level_replica: warningdisable_tqdm: Falseproject: huggingfacetrackio_space_id: trackioeval_strategy: epochper_device_eval_batch_size: 16prediction_loss_only: Trueeval_on_start: Falseeval_do_concat_batches: Trueeval_use_gather_object: Falseeval_accumulation_steps: Noneinclude_for_metrics: []batch_eval_metrics: Falsesave_only_model: Falsesave_on_each_node: Falseenable_jit_checkpoint: Falsepush_to_hub: Falsehub_private_repo: Nonehub_model_id: Nonehub_strategy: every_savehub_always_push: Falsehub_revision: Noneload_best_model_at_end: Trueignore_data_skip: Falserestore_callback_states_from_checkpoint: Falsefull_determinism: Falseseed: 42data_seed: Noneuse_cpu: Falseaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedataloader_drop_last: Falsedataloader_num_workers: 0dataloader_pin_memory: Truedataloader_persistent_workers: Falsedataloader_prefetch_factor: Noneremove_unused_columns: Truelabel_names: Nonetrain_sampling_strategy: randomlength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falseddp_backend: Noneddp_timeout: 1800fsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}deepspeed: Nonedebug: []skip_memory_metrics: Truedo_predict: Falseresume_from_checkpoint: Nonewarmup_ratio: Nonelocal_rank: -1prompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportionalrouter_mapping: {}learning_rate_mapping: {}| Epoch | Step | Training Loss | ir_eval_cosine_ndcg@10 |
|---|---|---|---|
| -1 | -1 | - | 0.5028 |
| 0.4396 | 10 | 1.4219 | - |
| 0.8791 | 20 | 0.6942 | - |
| 1.0 | 23 | - | 0.6978 |
| 1.3077 | 30 | 0.4762 | - |
| 1.7473 | 40 | 0.4019 | - |
| 2.0 | 46 | - | 0.7337 |
| 2.1758 | 50 | 0.3741 | - |
| 2.6154 | 60 | 0.3149 | - |
| 3.0 | 69 | - | 0.7397 |
| 3.0440 | 70 | 0.3330 | - |
| 3.4835 | 80 | 0.2684 | - |
| 3.9231 | 90 | 0.3183 | - |
| 4.0 | 92 | - | 0.7413 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{oord2019representationlearningcontrastivepredictive,
title={Representation Learning with Contrastive Predictive Coding},
author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
year={2019},
eprint={1807.03748},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/1807.03748},
}
Base model
answerdotai/ModernBERT-base