jina-embeddings-v5-text
Collection
Our 5th-gen embeddings: two lightweight multilingual models with SOTA performance in retrieval, matching, clustering, and classification. β’ 29 items β’ Updated β’ 38
MLX port of Jina AI's v5-text-small-text-matching embedding model for Apple Silicon.
Elastic Inference Service | ArXiv | Blog
pip install mlx tokenizers huggingface_hub
The fastest way to use v5-text in production. Elastic Inference Service (EIS) provides managed embedding inference with built-in scaling, so you can generate embeddings directly within your Elastic deployment.
PUT _inference/text_embedding/jina-v5
{
"service": "elastic",
"service_settings": {
"model_id": "jina-embeddings-v5-text-small"
}
}
See the Elastic Inference Service documentation for setup details.
import mlx.core as mx
from tokenizers import Tokenizer
from model import JinaEmbeddingModel
import json
# Load config
with open("config.json") as f:
config = json.load(f)
# Load model (full precision)
model = JinaEmbeddingModel(config)
weights = mx.load("model.safetensors")
model.load_weights(list(weights.items()))
# Load tokenizer
tokenizer = Tokenizer.from_file("tokenizer.json")
# Encode
texts = ["Query: What is machine learning?", "Document: Machine learning is..."]
embeddings = model.encode(texts, tokenizer, task_type="text-matching.query")
For the text-matching variant:
text-matching.query - For search queriestext-matching.passage - For documents/passagesSupports Matryoshka embedding truncation to: 32, 64, 128, 256, 512, 768, 1024
# Get 256-dim embedding
embeddings_256 = embeddings[:, :256]
jina-embeddings-v5-text-small-text-matching-mlx/
βββ model.safetensors # Model weights (float16)
βββ model.py # Model implementation
βββ config.json # Model configuration
βββ tokenizer.json # Tokenizer
βββ tokenizer_config.json
βββ vocab.json
βββ merges.txt
βββ .gitignore
βββ README.md
@misc{akram2026jinaembeddingsv5texttasktargetedembeddingdistillation,
title={jina-embeddings-v5-text: Task-Targeted Embedding Distillation},
author={Mohammad Kalim Akram and Saba Sturua and Nastia Havriushenko and Quentin Herreros and Michael GΓΌnther and Maximilian Werk and Han Xiao},
year={2026},
eprint={2602.15547},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2602.15547},
}
CC BY-NC 4.0
Quantized
Base model
Qwen/Qwen3-0.6B-Base