Instructions to use ronit01/rag_tuned_minilm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ronit01/rag_tuned_minilm with sentence-transformers:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("ronit01/rag_tuned_minilm")

sentences = [
    "How do you resolve an ImportError for GenerationMixin that occurs between experiments?",
    "This use case notebook features an hybrid workflow spanning a self-hosted open LLM for embeddings and an Open AI call for generation. ",
    "This tutorial shows Group Relative Policy Optimization (GRPO) to improve mathematical reasoning capabilities. \nGRPO is an RL approach that uses multiple reward functions to provide richer training signals.\n\nIt uses the GSM8K mathematical reasoning dataset;\n`see its details on Hugging Face <https://huggingface.co/datasets/openai/gsm8k>`__.\nWe use a sample of 500 training examples and 100 evaluation examples for tractable demo runtimes.\n\nThe prompt format includes a system message instructing the model to respond with structured reasoning\nand answer tags, encouraging step-by-step mathematical problem solving with clear formatting.\n\n\nModel, Adapter, and Trainer Knobs\n-------\n\nWe compare 3 different base model architectures: Llama-3.1-8B-Instruct, Qwen2.5-3B-Instruct, \nand Qwen2.5-7B-Instruct, all using 4-bit quantization for efficient training.\n\nAll models use the same medium capacity LoRA configuration, targeting only 2 modules. \nWe compare two different learning rates for the smaller Qwen model alone.\nThis results in 4 total combinations launched with a simple grid search.\n\nThere are 5 custom reward functions that collectively shape the model's behavior. \nThe whole set of reward functions is used for all configs. \n\n* Correctness reward: Awards 2.0 points for matching the ground truth answer exactly.\n* Integer reward: Awards 0.5 points for producing numeric answers (validates output format).\n* Strict format reward: Awards 0.5 points for exact XML formatting compliance.\n* Soft format reward: Awards 0.5 points for flexible XML formatting (more lenient matching).\n* XML count reward: Fine-grained reward (up to 0.5 points) for proper XML tag usage and structure.\n\nThe lite version uses two smaller architectures: Qwen2.5-0.5B-Instruct and Llama-3.2-1B-Instruct, \nboth still using 4-bit quantization. LoRA capacity is reduced with rank 16.",
    "    :param search_cfg: The search algorithm type and its kwargs to use for retrieval of vectors/chunks, provided as a single dictionary. Must include a key :code:`\"type\"` with one of the following three options listed as value; default is :code:`\"similarity\"`.\n\n      * :code:`\"similarity\"`: Standard cosine similarity search.\n      * :code:`\"similarity_score_threshold\"`: Similarity search with minimum score threshold (SST).\n      * :code:`\"mmr\"`: Maximum Marginal Relevance (MMR) search for diversity.\n\n      Additional parameters for search configuration depend on the type; the keys can include the following:\n\n      * :code:`\"k\"`: Number of documents to retrieve. Default is 5.\n      * :code:`\"filter\"`: Optional filter criteria function for search results.\n      * :code:`\"score_threshold\"`: Only for SST. Minimum similarity score threshold. \n      * :code:`\"fetch_k\"`: Only for MMR. Number of documents to fetch before MMR reranking. Default is 20.\n      * :code:`\"lambda_mult\"`: Only for MMR. Diversity parameter for MMR balancing relevance vs. diversity. Default is 0.5.\n    :type search_cfg: dict, optional"
]
embeddings = model.encode(sentences)

similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]

Notebooks
Google Colab
Kaggle

SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2

This is a sentence-transformers model finetuned from sentence-transformers/all-MiniLM-L6-v2. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: sentence-transformers/all-MiniLM-L6-v2
Maximum Sequence Length: 256 tokens
Output Dimensionality: 384 dimensions
Similarity Function: Cosine Similarity
Supported Modality: Text

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'BertModel'})
  (1): Pooling({'embedding_dimension': 384, 'pooling_mode': 'mean', 'include_prompt': True})
  (2): Normalize({})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ronit01/rag_tuned_minilm")
# Run inference
sentences = [
    "How does RapidFire AI's adaptive execution engine differ from traditional sequential execution for multi-config experiments?",
    'The crux of RapidFire AI\'s difference is in its *adaptive execution engine*: it enables "interruptible"\nexecution of configurations across GPUs/CPUs. To do so, it first shards the training and/or evaluation \ndataset randomly into "chunks" (also called "shards").\nThen instead of waiting for a run to see the whole dataset for all epochs (for SFT/RFT) or for full \neval metrics calculation (for RAG evals), RapidFire AI schedules all runs on *one shard at a time*, \nand then cycles through all shards.\n\nSuppose you have only 1 GPU, say an A100 or H100, and you want to run SFT on a Llama model. \nCurrent tools force you to run one config after another *sequentially* as shown in the (simplified) illustration below. \nIn contrast, by operating on shards, RapidFire AI offers a far more concurrent learning experience by \nautomatically *swapping* adapters (and base models, if needed) across GPU(s) and DRAM. \nIt does this via efficient shared memory-based caching mechanisms that can spill to disk when needed.\n\n.. image:: /images/gantt-1gpu.png\n   :width: 800px\n\nIn the above figure, all 3 model configs are shown for 1 epoch. RapidFire AI is set to use 4 chunks.\nSo, before model config 3 (M3) even starts in the sequential approach, RapidFire AI already shows you \nthe learning behaviors of all 3 configs on the first 2-3 chunks. \nThe overhead of swapping, represented by the thin gray box, is minimal, less than 5% of the runtime,\nas per our measurements--thanks to our new efficient memory management techniques.\n\nFor inference evals for RAG/context engineering, such sharded execution means RapidFire AI surfaces eval metrics \nsooner based on a statistical technique known as *online aggregation* from the database systems literature.\nBasically, see estimated values and confidence intervals for all eval metrics in real time as the shards \nget processed, ultimately converging to the exact metrics on the full dataset.',
    'We currently support two common knob set generators: :func:`List()` for a discrete \nset of values and :func:`Range()` for sampling from a continuous value interval.\n\n\n.. py:function:: List(values: List[Any])\n\n\t:param values: List of discrete values for a knob; all values must be the same python data type.\n\t:type values: List[Any]\n\n\n.. py:function:: Range(start: int | float, end: int | float, dtype: str = "int" | "float")\n\n\t:param start: Lower bound of range interval.\n\t:type start: int | float\n\n\t:param end: Upper bound of range interval.\n\t:type end: int | float\n\n\t:param dtype: Data type of value to be sampled, either :code:`"int"` or :code:`"float"`.\n\t:type dtype: str\n\n\n**Notes:**\n\nAs of this writing, :func:`Range()` performs uniform sampling within the given interval. \nWe plan to continue expanding this API and add more functionality on this front based on feedback.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.7594, 0.3727],
#         [0.7594, 1.0000, 0.2782],
#         [0.3727, 0.2782, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

Size: 208 training samples
Columns: sentence_0, sentence_1, and label
Approximate statistics based on the first 208 samples:
sentence_0 sentence_1 label
type string string float
details
min: 11 tokens
mean: 24.87 tokens
max: 34 tokens

min: 31 tokens
mean: 218.51 tokens
max: 256 tokens

min: 0.0
mean: 0.25
max: 1.0

	sentence_0	sentence_1	label
type	string	string	float
details	min: 11 tokens mean: 24.87 tokens max: 34 tokens	min: 31 tokens mean: 218.51 tokens max: 256 tokens	min: 0.0 mean: 0.25 max: 1.0

Samples:

sentence_0	sentence_1	label
`What is the difference between distributive and algebraic metrics in RapidFire AI's online aggregation for evals?`	`Note that if you plan to use only OpenAI APIs and not self-hosted models (for embedding or generation), you do NOT need GPUs on your machine.`
But you must provide a valid OpenAI API key via a config argument as shown in the GSM8K and SciFact tutorial notebooks.

Step 1: Install dependencies and package

Obtain the RapidFire AI OSS package from pypi (includes all dependencies) and ensure it is installed correctly.

.. important::

Requires Python 3.12+. Ensure that python3 resolves to Python 3.12 before creating the venv.

.. code-block:: bash

python3 --version # must be 3.12.x python3 -m venv .venv source .venv/bin/activate

pip install rapidfireai

rapidfireai --version

Verify it prints the following:

RapidFire AI 0..14.0

Due to current issue: https://github.com/huggingface/xet-core/issues/527

pip uninstall -y hf-xet

The tutorial notebooks for RAG evals do not use any gated models from Hugging Face. If you want to a... | 0.0 | | What is a 'leaf config' in RapidFire AI terminology, and how does it relate to runs? | Note that if you plan to use only OpenAI APIs and not self-hosted models (for embedding or generation), you do NOT need GPUs on your machine. But you must provide a valid OpenAI API key via a config argument as shown in the GSM8K and SciFact tutorial notebooks.



	
		
	
	
		Step 1: Install dependencies and package
	

Obtain the RapidFire AI OSS package from pypi (includes all dependencies) and ensure it is installed correctly.
.. important::
  Requires Python 3.12+. Ensure that python3 resolves to Python 3.12 before creating the venv.
.. code-block:: bash
   python3 --version  # must be 3.12.x
   python3 -m venv .venv
   source .venv/bin/activate
   pip install rapidfireai
   rapidfireai --version

	
		
	
	
		Verify it prints the following:
	


	
		
	
	
		RapidFire AI 0..14.0
	


	
		
	
	
		Due to current issue: https://github.com/huggingface/xet-core/issues/527
	

   pip uninstall -y hf-xet

The tutorial notebooks for RAG evals do not use any gated models from Hugging Face. If you want to a... | 0.0 | | What training-specific arguments can you configure in RFSFTConfig, and how does it relate to HuggingFace TRL? | This use case notebook features an hybrid workflow spanning a self-hosted open LLM for embeddings and an Open AI call for generation.



	
		
	
	
		Task, Dataset, and Prompt
	

This tutorial shows few-shot prompting as part of context engineering for solving grade school math word problems.
It uses the "GSM8K" dataset; 
see its details here <https://huggingface.co/datasets/openai/gsm8k>__. 
The dataset contains grade school math word problems requiring multi-step reasoning.
The prompt format includes system instructions defining the assistant as a math problem solver, 
semantically selected few-shot examples, and the target question to solve.

	
		
	
	
		Model, Few-Shot Selection, and Configuration Knobs
	

We compare 2 generator models via OpenAI API: gpt-5-mini and gpt-4o.
There are 2 different reasoning effort levels for the first model only: medium and high.
The few-shot prompting pipeline uses:

Example Selection: Semantic similarity-based selection using sentence-transformers/... | 0.0 |

Loss: ContrastiveLoss with these parameters:

{
    "distance_metric": "SiameseDistanceMetric.COSINE_DISTANCE",
    "margin": 0.5,
    "size_average": true
}

Training Hyperparameters

Non-Default Hyperparameters

per_device_train_batch_size: 16
per_device_eval_batch_size: 16
num_train_epochs: 1
multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand

do_predict: False
prediction_loss_only: True
per_device_train_batch_size: 16
per_device_eval_batch_size: 16
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 5e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1
num_train_epochs: 1
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: None
warmup_ratio: None
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
enable_jit_checkpoint: False
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
use_cpu: False
seed: 42
data_seed: None
bf16: False
fp16: False
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: -1
ddp_backend: None
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
parallelism_config: None
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch_fused
optim_args: None
group_by_length: False
length_column_name: length
project: huggingface
trackio_space_id: trackio
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_for_metrics: []
eval_do_concat_batches: True
auto_find_batch_size: False
full_determinism: False
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_num_input_tokens_seen: no
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
use_cache: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: round_robin
router_mapping: {}
learning_rate_mapping: {}

Training Time

Training: 2.6 seconds

Framework Versions

Python: 3.12.13
Sentence Transformers: 5.4.1
Transformers: 5.0.0
PyTorch: 2.10.0+cu128
Accelerate: 1.13.0
Datasets: 4.0.0
Tokenizers: 0.22.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ContrastiveLoss

@inproceedings{hadsell2006dimensionality,
    author={Hadsell, R. and Chopra, S. and LeCun, Y.},
    booktitle={2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)},
    title={Dimensionality Reduction by Learning an Invariant Mapping},
    year={2006},
    volume={2},
    number={},
    pages={1735-1742},
    doi={10.1109/CVPR.2006.100}
}

Downloads last month: 48

Safetensors

Model size

22.7M params

Tensor type

F32

Model tree for ronit01/rag_tuned_minilm

Base model

sentence-transformers/all-MiniLM-L6-v2

Finetuned

(855)

this model

Paper for ronit01/rag_tuned_minilm

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Paper • 1908.10084 • Published Aug 27, 2019 • 13