granite-embedding-97m-multilingual-r2-yat

ibm-granite/granite-embedding-97m-multilingual-r2 (ModernBERT, SiLU-gated GLU FFN) with every feed-forward block replaced by a sigmoid-gated Yat-kernel MLP, via phased distillation (random-token + real per-block warm-start, then end-to-end last-layer distillation on English all-nli).

English MTEB STS avg: 0.7268 (teacher 0.764). Distilled on English only; multilingual STS17/STS22 evaluated zero-shot (see repo files / paper).

from sentence_transformers import SentenceTransformer
m = SentenceTransformer("mlnomad/granite-embedding-97m-multilingual-r2-yat", trust_remote_code=True)
m.encode(["hello world"])

Yat FFN: (softplus(ar) * (x.W+b)^2/(||x-W||^2+exp(le)) * sigmoid(gate(x))) @ A + c.

Downloads last month
22
Safetensors
Model size
97.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlnomad/granite-embedding-97m-multilingual-r2-yat

Finetuned
(4)
this model