ModernBERT Embed base Legal Matryoshka

This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nomic-ai/modernbert-embed-base
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("stardriver007/modernbert-embed-base-legal-matryoshka-qlin")
# Run inference
sentences = [
    'C05362492, C05363265, C05363815, C05365820, C05366449, C05366894, C05366895, C05366902, C-5371430, \nC05371431, C05371432, C05371433, C05375987, C05403192, C05549840, C05486085, C05498760, C05498761, \nand C05548237. \n135 \n \nis claimed as to entire documents or only portions of documents.  Particularly in light of the',
    'What is the document number associated with C05549840?',
    'What do Plaintiffs incorrectly suggest regarding Section 125.9?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.5193
cosine_accuracy@3 0.558
cosine_accuracy@5 0.6615
cosine_accuracy@10 0.7388
cosine_precision@1 0.5193
cosine_precision@3 0.493
cosine_precision@5 0.3827
cosine_precision@10 0.2309
cosine_recall@1 0.1823
cosine_recall@3 0.4808
cosine_recall@5 0.6091
cosine_recall@10 0.7306
cosine_ndcg@10 0.6293
cosine_mrr@10 0.5679
cosine_map@100 0.6105

Information Retrieval

Metric Value
cosine_accuracy@1 0.5147
cosine_accuracy@3 0.5549
cosine_accuracy@5 0.6553
cosine_accuracy@10 0.7311
cosine_precision@1 0.5147
cosine_precision@3 0.4879
cosine_precision@5 0.3796
cosine_precision@10 0.226
cosine_recall@1 0.1816
cosine_recall@3 0.4773
cosine_recall@5 0.6059
cosine_recall@10 0.7164
cosine_ndcg@10 0.6205
cosine_mrr@10 0.5626
cosine_map@100 0.6045

Information Retrieval

Metric Value
cosine_accuracy@1 0.476
cosine_accuracy@3 0.5131
cosine_accuracy@5 0.6151
cosine_accuracy@10 0.6785
cosine_precision@1 0.476
cosine_precision@3 0.4518
cosine_precision@5 0.3564
cosine_precision@10 0.21
cosine_recall@1 0.1669
cosine_recall@3 0.4401
cosine_recall@5 0.5648
cosine_recall@10 0.6649
cosine_ndcg@10 0.5752
cosine_mrr@10 0.5213
cosine_map@100 0.5648

Information Retrieval

Metric Value
cosine_accuracy@1 0.4173
cosine_accuracy@3 0.4606
cosine_accuracy@5 0.5332
cosine_accuracy@10 0.6074
cosine_precision@1 0.4173
cosine_precision@3 0.3957
cosine_precision@5 0.3107
cosine_precision@10 0.1878
cosine_recall@1 0.1491
cosine_recall@3 0.3888
cosine_recall@5 0.4916
cosine_recall@10 0.5935
cosine_ndcg@10 0.5104
cosine_mrr@10 0.4592
cosine_map@100 0.504

Information Retrieval

Metric Value
cosine_accuracy@1 0.3323
cosine_accuracy@3 0.3648
cosine_accuracy@5 0.4529
cosine_accuracy@10 0.51
cosine_precision@1 0.3323
cosine_precision@3 0.3158
cosine_precision@5 0.2572
cosine_precision@10 0.1584
cosine_recall@1 0.1168
cosine_recall@3 0.3087
cosine_recall@5 0.4104
cosine_recall@10 0.5003
cosine_ndcg@10 0.4206
cosine_mrr@10 0.3717
cosine_map@100 0.4123

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 5,822 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 29 tokens
    • mean: 96.97 tokens
    • max: 170 tokens
    • min: 7 tokens
    • mean: 16.5 tokens
    • max: 49 tokens
  • Samples:
    positive anchor
    important piece of software excessively difficult to use). A client may
    require the lawyer to implement special security measures not
    required by this rule or may give informed consent to forgo security
    measures that would otherwise be required by this rule. Whether a
    lawyer may be required to take additional steps to safeguard a
    What can a client give informed consent to forgo?
    examples of withheld but publicly available information to be evidence of “bad faith,” in addition to “general
    sloppiness,” the Court concludes that such examples are not evidence of bad faith for the same reasons it concludes
    that they are not evidence of “general sloppiness.”
    93

    information, and the plaintiff does not ask the Court to order the CIA to disclose any officially
    Is the plaintiff asking the Court to order the CIA to disclose any information?
    does not support such a broad reading.

    46 As to category (7) above, the Court does not hold that the CIA is necessarily required to disclose information
    about intelligence gathering in response to FOIA requests. Rather, the Court narrowly holds that § 403g does not
    Which category number is referenced as an example in this text?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • tp_size: 0
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
0.8791 10 90.1248 - - - - -
1.0 12 - 0.5841 0.5786 0.5283 0.4502 0.3541
1.7033 20 39.4375 - - - - -
2.0 24 - 0.6170 0.6097 0.5653 0.5001 0.4072
2.5275 30 29.9004 - - - - -
3.0 36 - 0.6271 0.619 0.5756 0.511 0.419
3.3516 40 24.1188 - - - - -
3.7033 44 - 0.6293 0.6205 0.5752 0.5104 0.4206
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 4.0.1
  • Transformers: 4.50.3
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.5.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
4
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for stardriver007/modernbert-embed-base-legal-matryoshka-qlin

Finetuned
(109)
this model

Papers for stardriver007/modernbert-embed-base-legal-matryoshka-qlin

Evaluation results