Jina Embeddings v5 Text Small Text Matching - MLX

MLX port of Jina AI's v5-text-small-text-matching embedding model for Apple Silicon.

Elastic Inference Service | ArXiv | Blog

Installation

pip install mlx tokenizers huggingface_hub

Usage

via Elastic Inference Service

The fastest way to use v5-text in production. Elastic Inference Service (EIS) provides managed embedding inference with built-in scaling, so you can generate embeddings directly within your Elastic deployment.

PUT _inference/text_embedding/jina-v5
{
  "service": "elastic",
  "service_settings": {
    "model_id": "jina-embeddings-v5-text-small"
  }
}

See the Elastic Inference Service documentation for setup details.

### Full Precision Model

import mlx.core as mx
from tokenizers import Tokenizer
from model import JinaEmbeddingModel
import json

# Load config
with open("config.json") as f:
    config = json.load(f)

# Load model (full precision)
model = JinaEmbeddingModel(config)
weights = mx.load("model.safetensors")
model.load_weights(list(weights.items()))

# Load tokenizer
tokenizer = Tokenizer.from_file("tokenizer.json")

# Encode
texts = ["Query: What is machine learning?", "Document: Machine learning is..."]
embeddings = model.encode(texts, tokenizer, task_type="text-matching.query")

Task Types

For the text-matching variant:

text-matching.query - For search queries
text-matching.passage - For documents/passages

Matryoshka Dimensions

Supports Matryoshka embedding truncation to: 32, 64, 128, 256, 512, 768, 1024

# Get 256-dim embedding
embeddings_256 = embeddings[:, :256]

Model Details

jina-embeddings-v5-text Architecture

Architecture: Qwen3-0.6B with task-specific LoRA adapters (pre-merged)
Embedding dimension: 1024
Max sequence length: 32768 tokens
Optimized for: Apple Silicon (M1/M2/M3/M4) with Metal acceleration

MMTEB Multilingual Benchmark

MTEB English Benchmark

Retrieval Benchmark Results

Files

jina-embeddings-v5-text-small-text-matching-mlx/
├── model.safetensors          # Model weights (float16)
├── model.py                    # Model implementation
├── config.json                 # Model configuration
├── tokenizer.json              # Tokenizer
├── tokenizer_config.json
├── vocab.json
├── merges.txt
├── .gitignore
└── README.md

Citation

@misc{akram2026jinaembeddingsv5texttasktargetedembeddingdistillation,
      title={jina-embeddings-v5-text: Task-Targeted Embedding Distillation}, 
      author={Mohammad Kalim Akram and Saba Sturua and Nastia Havriushenko and Quentin Herreros and Michael Günther and Maximilian Werk and Han Xiao},
      year={2026},
      eprint={2602.15547},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.15547}, 
}