---
language:
- multilingual
- fr
- de
- zh
- ru
- pl
- es
- it
- ja
- ar
- hi
- pt
- nl
license: mit
tags:
- sentence-transformers
- cross-encoder
- reranker
- generated_from_trainer
- dataset_size:9358
- loss:LambdaLoss
base_model: jhu-clsp/mmBERT-small
datasets:
- Antix5/Product_Similarity_Dataset
pipeline_tag: text-ranking
library_name: sentence-transformers
metrics:
- map
- mrr@10
- ndcg@10
model-index:
- name: mmBERT-small reranker (LambdaLoss NDCG2++)
  results:
  - task:
      type: cross-encoder-reranking
      name: Cross Encoder Reranking
    dataset:
      name: rerank
      type: rerank
    metrics:
    - type: map
      value: 0.9562
      name: Map
    - type: mrr@10
      value: 0.9561
      name: Mrr@10
    - type: ndcg@10
      value: 0.9656
      name: Ndcg@10
---

# mmBERT-small reranker (LambdaLoss NDCG2++)

This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [jhu-clsp/mmBERT-small](https://huggingface.co/jhu-clsp/mmBERT-small) on the [product_similarity_dataset](https://huggingface.co/datasets/Antix5/Product_Similarity_Dataset) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

## Model Details

### Model Description
- **Model Type:** Cross Encoder
- **Base model:** [jhu-clsp/mmBERT-small](https://huggingface.co/jhu-clsp/mmBERT-small) <!-- at revision 13f0442f54d0a41da1dbd91e129e6c4277f9ab2c -->
- **Maximum Sequence Length:** 256 tokens
- **Number of Output Labels:** 1 label
- **Training Dataset:**
    - [product_similarity_dataset](https://huggingface.co/datasets/Antix5/Product_Similarity_Dataset)
- **Languages:** multilingual, fr, de, zh, ru, pl, es, it, ja, ar, hi, pt, nl
- **License:** mit

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder)

**Warning : This model is just starting training, this is just a checkpoint**

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("Antix5/product-reranker-mmBERT-small")
# Get scores for pairs of texts
pairs = [
    ['Milk Belgian Chocolate', 'Milk Chocolate Flavor'],
]
scores = model.predict(pairs)
print(scores.shape)
# (3,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    '70 % Cacao Dark Chocolate With Coconut',
    [
        'DRK CHCLT BAR, COCONUT',
        'Coconut Cream Filled Dark Chocolate',
        'Blueberry & Dark Chocolate With Chia',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

## Evaluation

### Metrics

#### Cross Encoder Reranking

* Dataset: `rerank`
* Evaluated with [<code>CrossEncoderRerankingEvaluator</code>](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters:
  ```json
  {
      "at_k": 10,
      "always_rerank_positives": false
  }
  ```

| Metric      | Value                |
|:------------|:---------------------|
| map         | 0.9562 (-0.0359)     |
| mrr@10      | 0.9561 (-0.0385)     |
| **ndcg@10** | **0.9656 (-0.0291)** |

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### product_similarity_dataset

* Dataset: [product_similarity_dataset](https://huggingface.co/datasets/Antix5/Product_Similarity_Dataset) at [7aba3ef](https://huggingface.co/datasets/Antix5/Product_Similarity_Dataset/tree/7aba3ef7a1d4bb067b26c4cafb376f7447850314)
* Size: 9,358 training samples
* Columns: <code>query</code>, <code>documents</code>, and <code>scores</code>
* Approximate statistics based on the first 1000 samples:
  |         | query                                                                                          | documents                           | scores                              |
  |:--------|:-----------------------------------------------------------------------------------------------|:------------------------------------|:------------------------------------|
  | type    | string                                                                                         | list                                | list                                |
  | details | <ul><li>min: 6 characters</li><li>mean: 57.18 characters</li><li>max: 197 characters</li></ul> | <ul><li>size: 16 elements</li></ul> | <ul><li>size: 16 elements</li></ul> |
* Samples:
  | query                                                               | documents                                                                                                                                                                                                                                    | scores                                         |
  |:--------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------|
  | <code>Premier 26764 Car Spinner, Santa, 25 by 19-1/2-Inch</code>    | <code>['Premier 26764 Tourbillon pour voiture, Santa, 25 x 19-1/2 pouces', 'BNTS, ЧИПСЫ ИЗ ФАСОЛИ NV И МОРСКАЯ СОЛЬ', 'Beanitos, Чипс из фасоли navy, Сыр на чо', 'K2 स्केट व्हील (4 का पैक)', 'BLST BALL МЯЧ ДЛЯ КИКА (2 ШТ.)', ...]</code> | <code>[1.0, 0.0, 0.0, 0.0, 0.0, ...]</code>    |
  | <code>Juice Cocktail Blend From Concentrate, Apple Blueberry</code> | <code>['Mélange de cocktail de jus à base de concentré, pomme myrtille', 'Orange Juice From Concentrate With Pulp', 'Tropical Juice Splash From Concentrate', 'BLUEBERRY JUICE DRNK', 'APPLE NECTAR JUICE DRINK FROM CNCNTRT', ...]</code>   | <code>[1.0, 0.4, 0.35, 0.65, 0.55, ...]</code> |
  | <code>Fruity Sour Strips Fruit-Flavored Chewy Candy</code>          | <code>['Fruity Sour Strips Fruit-Flavored Chewy Candy', 'SR CANDIES, FRUIT SOUR', 'Fruit Candy, Fruit', 'FRT SNCK TUTTI FRUITY', 'Fruit Strips, Peach Passion', ...]</code>                                                                  | <code>[1.0, 0.95, 0.7, 0.55, 0.9, ...]</code>  |
* Loss: [<code>LambdaLoss</code>](https://sbert.net/docs/package_reference/cross_encoder/losses.html#lambdaloss) with these parameters:
  ```json
  {
      "weighting_scheme": "sentence_transformers.cross_encoder.losses.LambdaLoss.NDCGLoss2PPScheme",
      "k": null,
      "sigma": 1.0,
      "eps": 1e-10,
      "reduction_log": "binary",
      "activation_fn": "torch.nn.modules.linear.Identity",
      "mini_batch_size": null
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `eval_strategy`: steps
- `learning_rate`: 2e-05
- `num_train_epochs`: 1
- `warmup_ratio`: 0.1
- `fp16`: True
- `load_best_model_at_end`: True
- `gradient_checkpointing`: True

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: steps
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 8
- `per_device_eval_batch_size`: 8
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 2e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1.0
- `num_train_epochs`: 1
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.1
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: True
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: True
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `parallelism_config`: None
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch_fused
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `hub_revision`: None
- `gradient_checkpointing`: True
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `liger_kernel_config`: None
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: proportional
- `router_mapping`: {}
- `learning_rate_mapping`: {}

</details>

### Training Logs
| Epoch  | Step | Training Loss | rerank_ndcg@10   |
|:------:|:----:|:-------------:|:----------------:|
| 0.0009 | 1    | 0.1261        | -                |
| 0.0171 | 20   | 0.1193        | -                |
| 0.0342 | 40   | 0.0767        | -                |
| 0.0513 | 60   | 0.0563        | -                |
| 0.0684 | 80   | 0.055         | -                |
| 0.0855 | 100  | 0.0546        | -                |
| 0.1026 | 120  | 0.0483        | -                |
| 0.1197 | 140  | 0.0489        | -                |
| 0.1368 | 160  | 0.049         | -                |
| 0.1538 | 180  | 0.0463        | -                |
| 0.1709 | 200  | 0.046         | 0.9419 (-0.0528) |
| 0.1880 | 220  | 0.0411        | -                |
| 0.2051 | 240  | 0.0398        | -                |
| 0.2222 | 260  | 0.0456        | -                |
| 0.2393 | 280  | 0.0463        | -                |
| 0.2564 | 300  | 0.043         | -                |
| 0.2735 | 320  | 0.0447        | -                |
| 0.2906 | 340  | 0.0419        | -                |
| 0.3077 | 360  | 0.0403        | -                |
| 0.3248 | 380  | 0.0429        | -                |
| 0.3419 | 400  | 0.0423        | 0.9653 (-0.0294) |
| 0.3590 | 420  | 0.0406        | -                |
| 0.3761 | 440  | 0.041         | -                |
| 0.3932 | 460  | 0.0427        | -                |
| 0.4103 | 480  | 0.0376        | -                |
| 0.4274 | 500  | 0.0408        | -                |
| 0.4444 | 520  | 0.0394        | -                |
| 0.4615 | 540  | 0.0423        | -                |
| 0.4786 | 560  | 0.0403        | -                |
| 0.4957 | 580  | 0.0336        | -                |
| 0.5128 | 600  | 0.039         | 0.9668 (-0.0279) |
| 0.5299 | 620  | 0.0389        | -                |
| 0.5470 | 640  | 0.0376        | -                |
| 0.5641 | 660  | 0.0422        | -                |
| 0.5812 | 680  | 0.0406        | -                |
| 0.5983 | 700  | 0.037         | -                |
| 0.6154 | 720  | 0.0368        | -                |
| 0.6325 | 740  | 0.0365        | -                |
| 0.6496 | 760  | 0.0356        | -                |
| 0.6667 | 780  | 0.0359        | -                |
| 0.6838 | 800  | 0.0368        | 0.9646 (-0.0301) |
| 0.7009 | 820  | 0.0342        | -                |
| 0.7179 | 840  | 0.0376        | -                |
| 0.7350 | 860  | 0.036         | -                |
| 0.7521 | 880  | 0.0331        | -                |
| 0.7692 | 900  | 0.0341        | -                |
| 0.7863 | 920  | 0.0372        | -                |
| 0.8034 | 940  | 0.0361        | -                |
| 0.8205 | 960  | 0.0352        | -                |
| 0.8376 | 980  | 0.0351        | -                |
| 0.8547 | 1000 | 0.0348        | 0.9620 (-0.0327) |
| 0.8718 | 1020 | 0.0341        | -                |
| 0.8889 | 1040 | 0.0354        | -                |
| 0.9060 | 1060 | 0.035         | -                |
| 0.9231 | 1080 | 0.0325        | -                |
| 0.9402 | 1100 | 0.038         | -                |
| 0.9573 | 1120 | 0.0376        | -                |
| 0.9744 | 1140 | 0.0335        | -                |
| 0.9915 | 1160 | 0.0375        | -                |
| -1     | -1   | -             | 0.9656 (-0.0291) |


### Framework Versions
- Python: 3.12.11
- Sentence Transformers: 5.1.0
- Transformers: 4.56.1
- PyTorch: 2.8.0+cu126
- Accelerate: 1.10.1
- Datasets: 2.20.0
- Tokenizers: 0.22.0

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### LambdaLoss
```bibtex
@inproceedings{wang2018lambdaloss,
  title={The LambdaLoss Framework for Ranking Metric Optimization},
  author={Wang, Xuanhui and Li, Cheng and Golbandi, Nadav and Bendersky, Michael and Najork, Marc},
  booktitle={Proceedings of the 27th ACM international conference on information and knowledge management},
  pages={1313--1322},
  year={2018}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->