SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/all-MiniLM-L6-v2
- Maximum Sequence Length: 256 tokens
- Output Dimensionality: 384 dimensions
- Similarity Function: Cosine Similarity
- Supported Modality: Text
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
(1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'mean', 'include_prompt': True})
(2): Normalize({})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("ronit01/rag_tuned_minilm")
# Run inference
sentences = [
"How does RapidFire AI's adaptive execution engine differ from traditional sequential execution for multi-config experiments?",
'The crux of RapidFire AI\'s difference is in its *adaptive execution engine*: it enables "interruptible"\nexecution of configurations across GPUs/CPUs. To do so, it first shards the training and/or evaluation \ndataset randomly into "chunks" (also called "shards").\nThen instead of waiting for a run to see the whole dataset for all epochs (for SFT/RFT) or for full \neval metrics calculation (for RAG evals), RapidFire AI schedules all runs on *one shard at a time*, \nand then cycles through all shards.\n\nSuppose you have only 1 GPU, say an A100 or H100, and you want to run SFT on a Llama model. \nCurrent tools force you to run one config after another *sequentially* as shown in the (simplified) illustration below. \nIn contrast, by operating on shards, RapidFire AI offers a far more concurrent learning experience by \nautomatically *swapping* adapters (and base models, if needed) across GPU(s) and DRAM. \nIt does this via efficient shared memory-based caching mechanisms that can spill to disk when needed.\n\n.. image:: /images/gantt-1gpu.png\n :width: 800px\n\nIn the above figure, all 3 model configs are shown for 1 epoch. RapidFire AI is set to use 4 chunks.\nSo, before model config 3 (M3) even starts in the sequential approach, RapidFire AI already shows you \nthe learning behaviors of all 3 configs on the first 2-3 chunks. \nThe overhead of swapping, represented by the thin gray box, is minimal, less than 5% of the runtime,\nas per our measurements--thanks to our new efficient memory management techniques.\n\nFor inference evals for RAG/context engineering, such sharded execution means RapidFire AI surfaces eval metrics \nsooner based on a statistical technique known as *online aggregation* from the database systems literature.\nBasically, see estimated values and confidence intervals for all eval metrics in real time as the shards \nget processed, ultimately converging to the exact metrics on the full dataset.',
'We currently support two common knob set generators: :func:`List()` for a discrete \nset of values and :func:`Range()` for sampling from a continuous value interval.\n\n\n.. py:function:: List(values: List[Any])\n\n\t:param values: List of discrete values for a knob; all values must be the same python data type.\n\t:type values: List[Any]\n\n\n.. py:function:: Range(start: int | float, end: int | float, dtype: str = "int" | "float")\n\n\t:param start: Lower bound of range interval.\n\t:type start: int | float\n\n\t:param end: Upper bound of range interval.\n\t:type end: int | float\n\n\t:param dtype: Data type of value to be sampled, either :code:`"int"` or :code:`"float"`.\n\t:type dtype: str\n\n\n**Notes:**\n\nAs of this writing, :func:`Range()` performs uniform sampling within the given interval. \nWe plan to continue expanding this API and add more functionality on this front based on feedback.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7594, 0.3727],
# [0.7594, 1.0000, 0.2782],
# [0.3727, 0.2782, 1.0000]])
Training Details
Training Dataset
Unnamed Dataset
Size: 208 training samples
Columns:
sentence_0,sentence_1, andlabelApproximate statistics based on the first 208 samples:
sentence_0 sentence_1 label type string string float details - min: 11 tokens
- mean: 24.87 tokens
- max: 34 tokens
- min: 31 tokens
- mean: 218.51 tokens
- max: 256 tokens
- min: 0.0
- mean: 0.25
- max: 1.0
Samples:
sentence_0 sentence_1 label What is the difference between distributive and algebraic metrics in RapidFire AI's online aggregation for evals?Note that if you plan to use only OpenAI APIs and not self-hosted models (for embedding or generation), you do NOT need GPUs on your machine.But you must provide a valid OpenAI API key via a config argument as shown in the GSM8K and SciFact tutorial notebooks. Step 1: Install dependencies and package
Obtain the RapidFire AI OSS package from pypi (includes all dependencies) and ensure it is installed correctly.
.. important::
Requires Python 3.12+. Ensure that
python3resolves to Python 3.12 before creating the venv... code-block:: bash
python3 --version # must be 3.12.x python3 -m venv .venv source .venv/bin/activate
pip install rapidfireai
rapidfireai --version
Verify it prints the following:
RapidFire AI 0..14.0
Due to current issue: https://github.com/huggingface/xet-core/issues/527
pip uninstall -y hf-xet
The tutorial notebooks for RAG evals do not use any gated models from Hugging Face. If you want to a... |
0.0| |What is a 'leaf config' in RapidFire AI terminology, and how does it relate to runs?|Note that if you plan to use only OpenAI APIs and not self-hosted models (for embedding or generation), you do NOT need GPUs on your machine. But you must provide a valid OpenAI API key via a config argument as shown in the GSM8K and SciFact tutorial notebooks.Step 1: Install dependencies and package
Obtain the RapidFire AI OSS package from pypi (includes all dependencies) and ensure it is installed correctly.
.. important::
Requires Python 3.12+. Ensure that
python3resolves to Python 3.12 before creating the venv... code-block:: bash
python3 --version # must be 3.12.x python3 -m venv .venv source .venv/bin/activate
pip install rapidfireai
rapidfireai --version
Verify it prints the following:
RapidFire AI 0..14.0
Due to current issue: https://github.com/huggingface/xet-core/issues/527
pip uninstall -y hf-xet
The tutorial notebooks for RAG evals do not use any gated models from Hugging Face. If you want to a...|0.0| |What training-specific arguments can you configure in RFSFTConfig, and how does it relate to HuggingFace TRL?|This use case notebook features an hybrid workflow spanning a self-hosted open LLM for embeddings and an Open AI call for generation.Task, Dataset, and Prompt
This tutorial shows few-shot prompting as part of context engineering for solving grade school math word problems.
It uses the "GSM8K" dataset;
see its details here <https://huggingface.co/datasets/openai/gsm8k>__. The dataset contains grade school math word problems requiring multi-step reasoning.The prompt format includes system instructions defining the assistant as a math problem solver, semantically selected few-shot examples, and the target question to solve.
Model, Few-Shot Selection, and Configuration Knobs
We compare 2 generator models via OpenAI API: gpt-5-mini and gpt-4o.
There are 2 different reasoning effort levels for the first model only: medium and high.
The few-shot prompting pipeline uses:
Example Selection: Semantic similarity-based selection using sentence-transformers/...|0.0|
Loss:
ContrastiveLosswith these parameters:{ "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE", "margin": 0.5, "size_average": true }
Training Hyperparameters
Non-Default Hyperparameters
per_device_train_batch_size: 16per_device_eval_batch_size: 16num_train_epochs: 1multi_dataset_batch_sampler: round_robin
All Hyperparameters
Click to expand
do_predict: Falseprediction_loss_only: Trueper_device_train_batch_size: 16per_device_eval_batch_size: 16gradient_accumulation_steps: 1eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: Nonewarmup_ratio: Nonewarmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Trueenable_jit_checkpoint: Falsesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseuse_cpu: Falseseed: 42data_seed: Nonebf16: Falsefp16: Falsebf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: -1ddp_backend: Nonedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonedisable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Falseignore_data_skip: Falsefsdp: []fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}parallelism_config: Nonedeepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torch_fusedoptim_args: Nonegroup_by_length: Falselength_column_name: lengthproject: huggingfacetrackio_space_id: trackioddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Truepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsehub_revision: Nonegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_for_metrics: []eval_do_concat_batches: Trueauto_find_batch_size: Falsefull_determinism: Falseddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_num_input_tokens_seen: noneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseliger_kernel_config: Noneeval_use_gather_object: Falseaverage_tokens_across_devices: Trueuse_cache: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: round_robinrouter_mapping: {}learning_rate_mapping: {}
Training Time
- Training: 2.6 seconds
Framework Versions
- Python: 3.12.13
- Sentence Transformers: 5.4.1
- Transformers: 5.0.0
- PyTorch: 2.10.0+cu128
- Accelerate: 1.13.0
- Datasets: 4.0.0
- Tokenizers: 0.22.2
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
ContrastiveLoss
@inproceedings{hadsell2006dimensionality,
author={Hadsell, R. and Chopra, S. and LeCun, Y.},
booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)},
title={Dimensionality Reduction by Learning an Invariant Mapping},
year={2006},
volume={2},
number={},
pages={1735-1742},
doi={10.1109/CVPR.2006.100}
}
- Downloads last month
- 48
Model tree for ronit01/rag_tuned_minilm
Base model
sentence-transformers/all-MiniLM-L6-v2