--- language: - multilingual - fr - de - zh - ru - pl - es - it - ja - ar - hi - pt - nl license: mit tags: - sentence-transformers - cross-encoder - reranker - generated_from_trainer - dataset_size:9358 - loss:LambdaLoss base_model: jhu-clsp/mmBERT-small datasets: - Antix5/Product_Similarity_Dataset pipeline_tag: text-ranking library_name: sentence-transformers metrics: - map - mrr@10 - ndcg@10 model-index: - name: mmBERT-small reranker (LambdaLoss NDCG2++) results: - task: type: cross-encoder-reranking name: Cross Encoder Reranking dataset: name: rerank type: rerank metrics: - type: map value: 0.9562 name: Map - type: mrr@10 value: 0.9561 name: Mrr@10 - type: ndcg@10 value: 0.9656 name: Ndcg@10 --- # mmBERT-small reranker (LambdaLoss NDCG2++) This is a [Cross Encoder](https://www.sbert.net/docs/cross_encoder/usage/usage.html) model finetuned from [jhu-clsp/mmBERT-small](https://huggingface.co/jhu-clsp/mmBERT-small) on the [product_similarity_dataset](https://huggingface.co/datasets/Antix5/Product_Similarity_Dataset) dataset using the [sentence-transformers](https://www.SBERT.net) library. It computes scores for pairs of texts, which can be used for text reranking and semantic search. ## Model Details ### Model Description - **Model Type:** Cross Encoder - **Base model:** [jhu-clsp/mmBERT-small](https://huggingface.co/jhu-clsp/mmBERT-small) - **Maximum Sequence Length:** 256 tokens - **Number of Output Labels:** 1 label - **Training Dataset:** - [product_similarity_dataset](https://huggingface.co/datasets/Antix5/Product_Similarity_Dataset) - **Languages:** multilingual, fr, de, zh, ru, pl, es, it, ja, ar, hi, pt, nl - **License:** mit ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Documentation:** [Cross Encoder Documentation](https://www.sbert.net/docs/cross_encoder/usage/usage.html) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Cross Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=cross-encoder) **Warning : This model is just starting training, this is just a checkpoint** ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import CrossEncoder # Download from the 🤗 Hub model = CrossEncoder("Antix5/product-reranker-mmBERT-small") # Get scores for pairs of texts pairs = [ ['Milk Belgian Chocolate', 'Milk Chocolate Flavor'], ] scores = model.predict(pairs) print(scores.shape) # (3,) # Or rank different texts based on similarity to a single text ranks = model.rank( '70 % Cacao Dark Chocolate With Coconut', [ 'DRK CHCLT BAR, COCONUT', 'Coconut Cream Filled Dark Chocolate', 'Blueberry & Dark Chocolate With Chia', ] ) # [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...] ``` ## Evaluation ### Metrics #### Cross Encoder Reranking * Dataset: `rerank` * Evaluated with [CrossEncoderRerankingEvaluator](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html#sentence_transformers.cross_encoder.evaluation.CrossEncoderRerankingEvaluator) with these parameters: ```json { "at_k": 10, "always_rerank_positives": false } ``` | Metric | Value | |:------------|:---------------------| | map | 0.9562 (-0.0359) | | mrr@10 | 0.9561 (-0.0385) | | **ndcg@10** | **0.9656 (-0.0291)** | ## Training Details ### Training Dataset #### product_similarity_dataset * Dataset: [product_similarity_dataset](https://huggingface.co/datasets/Antix5/Product_Similarity_Dataset) at [7aba3ef](https://huggingface.co/datasets/Antix5/Product_Similarity_Dataset/tree/7aba3ef7a1d4bb067b26c4cafb376f7447850314) * Size: 9,358 training samples * Columns: query, documents, and scores * Approximate statistics based on the first 1000 samples: | | query | documents | scores | |:--------|:-----------------------------------------------------------------------------------------------|:------------------------------------|:------------------------------------| | type | string | list | list | | details | | | | * Samples: | query | documents | scores | |:--------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------| | Premier 26764 Car Spinner, Santa, 25 by 19-1/2-Inch | ['Premier 26764 Tourbillon pour voiture, Santa, 25 x 19-1/2 pouces', 'BNTS, ЧИПСЫ ИЗ ФАСОЛИ NV И МОРСКАЯ СОЛЬ', 'Beanitos, Чипс из фасоли navy, Сыр на чо', 'K2 स्केट व्हील (4 का पैक)', 'BLST BALL МЯЧ ДЛЯ КИКА (2 ШТ.)', ...] | [1.0, 0.0, 0.0, 0.0, 0.0, ...] | | Juice Cocktail Blend From Concentrate, Apple Blueberry | ['Mélange de cocktail de jus à base de concentré, pomme myrtille', 'Orange Juice From Concentrate With Pulp', 'Tropical Juice Splash From Concentrate', 'BLUEBERRY JUICE DRNK', 'APPLE NECTAR JUICE DRINK FROM CNCNTRT', ...] | [1.0, 0.4, 0.35, 0.65, 0.55, ...] | | Fruity Sour Strips Fruit-Flavored Chewy Candy | ['Fruity Sour Strips Fruit-Flavored Chewy Candy', 'SR CANDIES, FRUIT SOUR', 'Fruit Candy, Fruit', 'FRT SNCK TUTTI FRUITY', 'Fruit Strips, Peach Passion', ...] | [1.0, 0.95, 0.7, 0.55, 0.9, ...] | * Loss: [LambdaLoss](https://sbert.net/docs/package_reference/cross_encoder/losses.html#lambdaloss) with these parameters: ```json { "weighting_scheme": "sentence_transformers.cross_encoder.losses.LambdaLoss.NDCGLoss2PPScheme", "k": null, "sigma": 1.0, "eps": 1e-10, "reduction_log": "binary", "activation_fn": "torch.nn.modules.linear.Identity", "mini_batch_size": null } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: steps - `learning_rate`: 2e-05 - `num_train_epochs`: 1 - `warmup_ratio`: 0.1 - `fp16`: True - `load_best_model_at_end`: True - `gradient_checkpointing`: True #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: steps - `prediction_loss_only`: True - `per_device_train_batch_size`: 8 - `per_device_eval_batch_size`: 8 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 1 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 2e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 1 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: True - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `parallelism_config`: None - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch_fused - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `hub_revision`: None - `gradient_checkpointing`: True - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `liger_kernel_config`: None - `eval_use_gather_object`: False - `average_tokens_across_devices`: False - `prompts`: None - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: proportional - `router_mapping`: {} - `learning_rate_mapping`: {}
### Training Logs | Epoch | Step | Training Loss | rerank_ndcg@10 | |:------:|:----:|:-------------:|:----------------:| | 0.0009 | 1 | 0.1261 | - | | 0.0171 | 20 | 0.1193 | - | | 0.0342 | 40 | 0.0767 | - | | 0.0513 | 60 | 0.0563 | - | | 0.0684 | 80 | 0.055 | - | | 0.0855 | 100 | 0.0546 | - | | 0.1026 | 120 | 0.0483 | - | | 0.1197 | 140 | 0.0489 | - | | 0.1368 | 160 | 0.049 | - | | 0.1538 | 180 | 0.0463 | - | | 0.1709 | 200 | 0.046 | 0.9419 (-0.0528) | | 0.1880 | 220 | 0.0411 | - | | 0.2051 | 240 | 0.0398 | - | | 0.2222 | 260 | 0.0456 | - | | 0.2393 | 280 | 0.0463 | - | | 0.2564 | 300 | 0.043 | - | | 0.2735 | 320 | 0.0447 | - | | 0.2906 | 340 | 0.0419 | - | | 0.3077 | 360 | 0.0403 | - | | 0.3248 | 380 | 0.0429 | - | | 0.3419 | 400 | 0.0423 | 0.9653 (-0.0294) | | 0.3590 | 420 | 0.0406 | - | | 0.3761 | 440 | 0.041 | - | | 0.3932 | 460 | 0.0427 | - | | 0.4103 | 480 | 0.0376 | - | | 0.4274 | 500 | 0.0408 | - | | 0.4444 | 520 | 0.0394 | - | | 0.4615 | 540 | 0.0423 | - | | 0.4786 | 560 | 0.0403 | - | | 0.4957 | 580 | 0.0336 | - | | 0.5128 | 600 | 0.039 | 0.9668 (-0.0279) | | 0.5299 | 620 | 0.0389 | - | | 0.5470 | 640 | 0.0376 | - | | 0.5641 | 660 | 0.0422 | - | | 0.5812 | 680 | 0.0406 | - | | 0.5983 | 700 | 0.037 | - | | 0.6154 | 720 | 0.0368 | - | | 0.6325 | 740 | 0.0365 | - | | 0.6496 | 760 | 0.0356 | - | | 0.6667 | 780 | 0.0359 | - | | 0.6838 | 800 | 0.0368 | 0.9646 (-0.0301) | | 0.7009 | 820 | 0.0342 | - | | 0.7179 | 840 | 0.0376 | - | | 0.7350 | 860 | 0.036 | - | | 0.7521 | 880 | 0.0331 | - | | 0.7692 | 900 | 0.0341 | - | | 0.7863 | 920 | 0.0372 | - | | 0.8034 | 940 | 0.0361 | - | | 0.8205 | 960 | 0.0352 | - | | 0.8376 | 980 | 0.0351 | - | | 0.8547 | 1000 | 0.0348 | 0.9620 (-0.0327) | | 0.8718 | 1020 | 0.0341 | - | | 0.8889 | 1040 | 0.0354 | - | | 0.9060 | 1060 | 0.035 | - | | 0.9231 | 1080 | 0.0325 | - | | 0.9402 | 1100 | 0.038 | - | | 0.9573 | 1120 | 0.0376 | - | | 0.9744 | 1140 | 0.0335 | - | | 0.9915 | 1160 | 0.0375 | - | | -1 | -1 | - | 0.9656 (-0.0291) | ### Framework Versions - Python: 3.12.11 - Sentence Transformers: 5.1.0 - Transformers: 4.56.1 - PyTorch: 2.8.0+cu126 - Accelerate: 1.10.1 - Datasets: 2.20.0 - Tokenizers: 0.22.0 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### LambdaLoss ```bibtex @inproceedings{wang2018lambdaloss, title={The LambdaLoss Framework for Ranking Metric Optimization}, author={Wang, Xuanhui and Li, Cheng and Golbandi, Nadav and Bendersky, Michael and Najork, Marc}, booktitle={Proceedings of the 27th ACM international conference on information and knowledge management}, pages={1313--1322}, year={2018} } ```