ModernBERT Embed Base Legal Fine-tuned

This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base on the legal-rag-positives-synthetic dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'ModernBertModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("aaa961/modernbert-embed-base-legal_no_MRL_reverse_dataset_early_stopping")
# Run inference
sentences = [
    'confidentiality agreement/order, that remain following those discussions.  This is a \nfinal report and notice of exceptions shall be filed within three days of the date of \nthis report, pursuant to Court of Chancery Rule 144(d)(2), given the expedited and \nsummary nature of Section 220 proceedings.  \n \n \n \n \n \n \n \nRespectfully, \n \n \n \n \n \n \n \n \n/s/ Patricia W. Griffin',
    'According to which court rule must the notice of exceptions be filed?',
    'decides whether to submit proposals on future procurements, and excluding mentor-protégé JVs \nfrom proposing on a solicitation due to Section 125.9(b)(3)(i) unnecessarily prevents protégés from \naccessing opportunities to grow as a business.  SHS MJAR at 22–23; VCH MJAR at 22–23.   \nSuch a critique, however, merely highlights Plaintiffs’ disagreement with the SBA’s',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.4922, 0.0280],
#         [0.4922, 1.0000, 0.0389],
#         [0.0280, 0.0389, 1.0000]])

Evaluation

Metrics

Information Retrieval

Metric ir ir_eval
cosine_accuracy@1 0.3617 0.6337
cosine_accuracy@3 0.4096 0.6893
cosine_accuracy@5 0.473 0.7759
cosine_accuracy@10 0.5255 0.8377
cosine_precision@1 0.3617 0.6337
cosine_precision@3 0.3524 0.6028
cosine_precision@5 0.277 0.4566
cosine_precision@10 0.1631 0.2583
cosine_recall@1 0.1269 0.2246
cosine_recall@3 0.3452 0.5938
cosine_recall@5 0.4405 0.7313
cosine_recall@10 0.5161 0.8237
cosine_ndcg@10 0.4459 0.736
cosine_mrr@10 0.4019 0.6825
cosine_map@100 0.442 0.7181

Training Details

Training Dataset

legal-rag-positives-synthetic

  • Dataset: legal-rag-positives-synthetic at f11534a
  • Size: 11,644 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 7 tokens
    • mean: 57.45 tokens
    • max: 160 tokens
    • min: 8 tokens
    • mean: 57.77 tokens
    • max: 157 tokens
  • Samples:
    anchor positive
    What kinds of issues are mentioned in connection with wrongdoing? mismanagement, waste and wrongdoing – and that it has demonstrated more than a
    credible basis from which the Court can infer possible mismanagement. It claims
    DR’s management failed to follow corporate governance mechanics and made
    critical business decisions without consulting with the Board or stockholders;
    failed to act with due diligence related to undertaking an ICO and discontinuing
    Project, 504 F.2d at 248 n.15).
    More, the requirement of “substantial” authority suggests that the entity should be at the
    “center of gravity in the exercise of administrative power.” Id. at 882 (quoting Lombardo v.
    Handler, 397 F. Supp. 792, 796 (D.D.C. 1975), aff’d, 546 F.2d 1043 (D.C. Cir. 1976)). On this
    What page reference is given for the Lombardo v. Handler case in the aforementioned citation?
    Where can more detailed information regarding redactions be found? parties specifically with respect to the FOIA request at issue in Count Eighteen of No. 11-444. This is likely
    because the CIA has previously instituted a categorical policy of indicating the basis for redactions at a document
    level, rather than a redaction level, as discussed above. See supra Part III.C.2. In light of the Court’s holding that
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • num_train_epochs: 4
  • learning_rate: 2e-05
  • lr_scheduler_type: cosine
  • warmup_steps: 0.1
  • optim: adamw_torch_fused
  • gradient_accumulation_steps: 16
  • bf16: True
  • tf32: True
  • eval_strategy: epoch
  • per_device_eval_batch_size: 16
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • per_device_train_batch_size: 32
  • num_train_epochs: 4
  • max_steps: -1
  • learning_rate: 2e-05
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: None
  • warmup_steps: 0.1
  • optim: adamw_torch_fused
  • optim_args: None
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • optim_target_modules: None
  • gradient_accumulation_steps: 16
  • average_tokens_across_devices: True
  • max_grad_norm: 1.0
  • label_smoothing_factor: 0.0
  • bf16: True
  • fp16: False
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • use_liger_kernel: False
  • liger_kernel_config: None
  • use_cache: False
  • neftune_noise_alpha: None
  • torch_empty_cache_steps: None
  • auto_find_batch_size: False
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • include_num_input_tokens_seen: no
  • log_level: passive
  • log_level_replica: warning
  • disable_tqdm: False
  • project: huggingface
  • trackio_space_id: trackio
  • eval_strategy: epoch
  • per_device_eval_batch_size: 16
  • prediction_loss_only: True
  • eval_on_start: False
  • eval_do_concat_batches: True
  • eval_use_gather_object: False
  • eval_accumulation_steps: None
  • include_for_metrics: []
  • batch_eval_metrics: False
  • save_only_model: False
  • save_on_each_node: False
  • enable_jit_checkpoint: False
  • push_to_hub: False
  • hub_private_repo: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_always_push: False
  • hub_revision: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • restore_callback_states_from_checkpoint: False
  • full_determinism: False
  • seed: 42
  • data_seed: None
  • use_cpu: False
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • dataloader_prefetch_factor: None
  • remove_unused_columns: True
  • label_names: None
  • train_sampling_strategy: random
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • ddp_backend: None
  • ddp_timeout: 1800
  • fsdp: []
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • deepspeed: None
  • debug: []
  • skip_memory_metrics: True
  • do_predict: False
  • resume_from_checkpoint: None
  • warmup_ratio: None
  • local_rank: -1
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss ir_cosine_ndcg@10 ir_eval_cosine_ndcg@10
-1 -1 - 0.4459 0.4459
0.4396 10 1.4221 - -
0.8791 20 0.6964 - -
1.0 23 - - 0.6760
1.3077 30 0.4787 - -
1.7473 40 0.4033 - -
2.0 46 - - 0.7196
2.1758 50 0.3770 - -
2.6154 60 0.3159 - -
3.0 69 - - 0.7361
3.0440 70 0.3345 - -
3.4835 80 0.2698 - -
3.9231 90 0.3188 - -
4.0 92 - - 0.7360
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.3.0
  • Transformers: 5.3.0
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.13.0
  • Datasets: 4.8.2
  • Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}
Downloads last month
4
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aaa961/modernbert-embed-base-legal_no_MRL_reverse_dataset_early_stopping

Finetuned
(109)
this model

Dataset used to train aaa961/modernbert-embed-base-legal_no_MRL_reverse_dataset_early_stopping

Papers for aaa961/modernbert-embed-base-legal_no_MRL_reverse_dataset_early_stopping

Evaluation results