MetaLingo-Indirect-Metaphor

This model is a fine-tuned version of Microsoft DeBERTa-v3-large for binary indirect metaphor detection in English at token level, trained on the VU Amsterdam Metaphor Corpus (VUAMC). It is produced by a two-stage knowledge distillation pipeline: Stage 1 transfers knowledge from an auxiliary teacher model via soft-label distillation on an out-of-domain reference corpus (BE06, ~954K tokens), and Stage 2 fine-tunes on the VUAMC gold standard with hard labels. Annotation follows MIPVU (Steen et al. 2010); only indirect metaphors (metaphor_type == "met") are labeled positive.

Model description

Base model: microsoft/deberta-v3-large
Task: Token classification (binary metaphor detection)
Training unit: Sentences, word-level labels
Teacher model (Stage 1): An auxiliary DeBERTa-v3-large indirect-metaphor classifier used to produce soft labels for Stage 1 distillation; not released as a standalone model.

Two-stage distillation pipeline

Stage 1 — Distillation on BE06 (~954K tokens). The student is pre-trained with KL-divergence distillation (temperature = 2.0) against soft labels produced by an auxiliary DeBERTa-v3-large teacher specialized for indirect-metaphor detection. All Stage 1 training happens on BE06, a British-English reference corpus that is completely independent of VUAMC (different source corpus, different documents, zero overlap). This stage exposes the student to a much larger and more varied token distribution than VUAMC alone provides, giving it a strong initialization for metaphor detection before it ever sees a VUAMC sentence.

Stage 2 — Gold fine-tuning on VUAMC (90 train docs). Starting from the Stage 1 checkpoint, the model is fine-tuned with hard-label cross-entropy on the 90-document official VUAMC training split (10% randomly held out as dev, seed=42), then evaluated once on the 27-document official test split.

Because Stage 1 never touches VUAMC at all, the official 27-document test split remains completely unseen until the single final evaluation — the test set is never contaminated.

Training & evaluation data

Dataset: VUAMC (VU Amsterdam Metaphor Corpus), covering four registers: News, Academic, Fiction, Conversation.
Split: Official NAACL FLP partition (Leong et al. 2018): 90 train documents / 27 test documents. A random 10% of training sentences is held out as dev set (seed=42).
Test set: 27 documents, 4,080 sentences, 57,811 tokens, 6,697 gold metaphor tokens.

Training hyperparameters

	Stage 1 (Distillation on BE06)	Stage 2 (Fine-tune on VUAMC)
Loss	KL Divergence (T=2)	Cross-Entropy
Epochs	3	3
Learning rate	2e-5	6e-6
Effective batch size	32 (8×4)	16 (8×2)
Max length	192	192
Warmup ratio	0.1	0.1

Results

Evaluated on the official 27-document test split (NAACL FLP 2018):

Metric	Value
F1	82.29
Precision	85.26
Recall	79.53
Accuracy	96.04

By genre

Genre is determined by BNC document ID per the NAACL FLP 2018 shared-task split (News: a*; Academic: b17, clw, cty, ecv; Fiction: bmw, ccw, faj; Conversation: k*).

Genre	Docs	Tokens	Metaphors	Precision	Recall	F1
Academic	4	14,208	2,425	90.90	82.80	86.66
News	14	13,405	1,953	86.94	77.73	82.08
Fiction	3	12,764	1,079	80.72	83.04	81.86
Conversation	6	17,434	1,240	76.48	72.90	74.65

By part of speech (Universal POS)

Tags with no gold metaphors in the test set (CCONJ, INTJ, NUM, SPACE, SYM, X) are omitted.

POS	Tokens	Metaphors	Precision	Recall	F1
ADP	4,959	1,922	93.62	90.89	92.24
DET	4,163	213	95.77	95.77	95.77
PRON	6,545	275	89.66	85.09	87.31
SCONJ	1,299	99	79.63	86.87	83.09
VERB	6,237	1,908	84.06	77.94	80.88
PUNCT	7,627	2	66.67	100.00	80.00
ADV	2,527	245	85.57	70.20	77.13
NOUN	8,630	1,383	83.19	69.78	75.89
ADJ	3,511	595	81.80	68.74	74.70
PART	1,881	8	57.14	50.00	53.33
AUX	4,219	10	45.45	50.00	47.62
PROPN	2,197	37	57.89	29.73	39.29

Label dictionary

{
  "0": "non_metaphor",
  "1": "metaphor"
}

Subwords are aligned to words via the tokenizer's word_ids; the first subword of each word is used for prediction.

Usage example

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

model_path = "tommyleo2077/metalingo-indirect-metaphor"  # or local path
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForTokenClassification.from_pretrained(model_path)
model.eval()

words = ["The", "conference", "threw", "a", "spanner", "in", "the", "works", "."]
inputs = tokenizer(
    words,
    is_split_into_words=True,
    return_tensors="pt",
    truncation=True,
    max_length=192,
)
word_ids = inputs.word_ids(batch_index=0)

with torch.no_grad():
    logits = model(**inputs).logits
preds = logits.argmax(dim=-1)[0].tolist()

word_preds = {}
for i, wid in enumerate(word_ids):
    if wid is not None and wid not in word_preds:
        word_preds[wid] = preds[i]

for i, w in enumerate(words):
    label = "metaphor" if word_preds.get(i, 0) == 1 else "non_metaphor"
    print(f"{w}\t{label}")

Citation

Model author: Tommy Leo — 1683619168tl@gmail.com

Dataset (MIPVU / VUAMC):

@book{steen2010method,
  title     = {A Method for Linguistic Metaphor Identification: From {MIP} to {MIPVU}},
  author    = {Steen, Gerard and Dorst, Aletta G. and Herrmann, J. Berenike and Kaal, Anna and Krennmayr, Tina and Pasma, Thea},
  year      = {2010},
  publisher = {John Benjamins}
}

Train/test split:

@inproceedings{leong2018vua,
  title     = {A Report on the 2018 {VUA} Metaphor Detection Shared Task},
  author    = {Leong, Chee Wee and Beigman Klebanov, Beata and Shutova, Ekaterina},
  booktitle = {Proceedings of the Workshop on Figurative Language Processing at NAACL-HLT 2018},
  year      = {2018}
}

Base model: Microsoft DeBERTa

This model:

@misc{leo2025metalingoindirectmetaphor,
  title        = {metalingo-indirect-metaphor: Two-Stage Knowledge Distillation for Metaphor Detection},
  author       = {Leo, Tommy},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/tommyleo2077/metalingo-indirect-metaphor}},
  note         = {Contact: 1683619168tl@gmail.com}
}

License

Apache License 2.0 — see LICENSE for details.

Downloads last month: 7

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for tommyleo2077/metalingo-indirect-metaphor

Base model

microsoft/deberta-v3-large

Finetuned

(279)

this model

Collection including tommyleo2077/metalingo-indirect-metaphor

Meta-Lingo

Collection

15 items • Updated 4 days ago • 1