MetaLingo-Indirect-Metaphor

This model is a fine-tuned version of Microsoft DeBERTa-v3-large for binary indirect metaphor detection in English at token level, trained on the VU Amsterdam Metaphor Corpus (VUAMC). It is produced by a two-stage knowledge distillation pipeline: Stage 1 transfers knowledge from an auxiliary teacher model via soft-label distillation on an out-of-domain reference corpus (BE06, ~954K tokens), and Stage 2 fine-tunes on the VUAMC gold standard with hard labels. Annotation follows MIPVU (Steen et al. 2010); only indirect metaphors (metaphor_type == "met") are labeled positive.

Model description

  • Base model: microsoft/deberta-v3-large
  • Task: Token classification (binary metaphor detection)
  • Training unit: Sentences, word-level labels
  • Teacher model (Stage 1): An auxiliary DeBERTa-v3-large indirect-metaphor classifier used to produce soft labels for Stage 1 distillation; not released as a standalone model.

Two-stage distillation pipeline

Stage 1 — Distillation on BE06 (~954K tokens). The student is pre-trained with KL-divergence distillation (temperature = 2.0) against soft labels produced by an auxiliary DeBERTa-v3-large teacher specialized for indirect-metaphor detection. All Stage 1 training happens on BE06, a British-English reference corpus that is completely independent of VUAMC (different source corpus, different documents, zero overlap). This stage exposes the student to a much larger and more varied token distribution than VUAMC alone provides, giving it a strong initialization for metaphor detection before it ever sees a VUAMC sentence.

Stage 2 — Gold fine-tuning on VUAMC (90 train docs). Starting from the Stage 1 checkpoint, the model is fine-tuned with hard-label cross-entropy on the 90-document official VUAMC training split (10% randomly held out as dev, seed=42), then evaluated once on the 27-document official test split.

Because Stage 1 never touches VUAMC at all, the official 27-document test split remains completely unseen until the single final evaluation — the test set is never contaminated.

Training & evaluation data

  • Dataset: VUAMC (VU Amsterdam Metaphor Corpus), covering four registers: News, Academic, Fiction, Conversation.
  • Split: Official NAACL FLP partition (Leong et al. 2018): 90 train documents / 27 test documents. A random 10% of training sentences is held out as dev set (seed=42).
  • Test set: 27 documents, 4,080 sentences, 57,811 tokens, 6,697 gold metaphor tokens.

Training hyperparameters

Stage 1 (Distillation on BE06) Stage 2 (Fine-tune on VUAMC)
Loss KL Divergence (T=2) Cross-Entropy
Epochs 3 3
Learning rate 2e-5 6e-6
Effective batch size 32 (8×4) 16 (8×2)
Max length 192 192
Warmup ratio 0.1 0.1

Results

Evaluated on the official 27-document test split (NAACL FLP 2018):

Metric Value
F1 82.29
Precision 85.26
Recall 79.53
Accuracy 96.04

By genre

Genre is determined by BNC document ID per the NAACL FLP 2018 shared-task split (News: a*; Academic: b17, clw, cty, ecv; Fiction: bmw, ccw, faj; Conversation: k*).

Genre Docs Tokens Metaphors Precision Recall F1
Academic 4 14,208 2,425 90.90 82.80 86.66
News 14 13,405 1,953 86.94 77.73 82.08
Fiction 3 12,764 1,079 80.72 83.04 81.86
Conversation 6 17,434 1,240 76.48 72.90 74.65

By part of speech (Universal POS)

Tags with no gold metaphors in the test set (CCONJ, INTJ, NUM, SPACE, SYM, X) are omitted.

POS Tokens Metaphors Precision Recall F1
ADP 4,959 1,922 93.62 90.89 92.24
DET 4,163 213 95.77 95.77 95.77
PRON 6,545 275 89.66 85.09 87.31
SCONJ 1,299 99 79.63 86.87 83.09
VERB 6,237 1,908 84.06 77.94 80.88
PUNCT 7,627 2 66.67 100.00 80.00
ADV 2,527 245 85.57 70.20 77.13
NOUN 8,630 1,383 83.19 69.78 75.89
ADJ 3,511 595 81.80 68.74 74.70
PART 1,881 8 57.14 50.00 53.33
AUX 4,219 10 45.45 50.00 47.62
PROPN 2,197 37 57.89 29.73 39.29

Label dictionary

{
  "0": "non_metaphor",
  "1": "metaphor"
}

Subwords are aligned to words via the tokenizer's word_ids; the first subword of each word is used for prediction.

Usage example

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

model_path = "tommyleo2077/metalingo-indirect-metaphor"  # or local path
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForTokenClassification.from_pretrained(model_path)
model.eval()

words = ["The", "conference", "threw", "a", "spanner", "in", "the", "works", "."]
inputs = tokenizer(
    words,
    is_split_into_words=True,
    return_tensors="pt",
    truncation=True,
    max_length=192,
)
word_ids = inputs.word_ids(batch_index=0)

with torch.no_grad():
    logits = model(**inputs).logits
preds = logits.argmax(dim=-1)[0].tolist()

word_preds = {}
for i, wid in enumerate(word_ids):
    if wid is not None and wid not in word_preds:
        word_preds[wid] = preds[i]

for i, w in enumerate(words):
    label = "metaphor" if word_preds.get(i, 0) == 1 else "non_metaphor"
    print(f"{w}\t{label}")

Citation

Model author: Tommy Leo — 1683619168tl@gmail.com

Dataset (MIPVU / VUAMC):

@book{steen2010method,
  title     = {A Method for Linguistic Metaphor Identification: From {MIP} to {MIPVU}},
  author    = {Steen, Gerard and Dorst, Aletta G. and Herrmann, J. Berenike and Kaal, Anna and Krennmayr, Tina and Pasma, Thea},
  year      = {2010},
  publisher = {John Benjamins}
}

Train/test split:

@inproceedings{leong2018vua,
  title     = {A Report on the 2018 {VUA} Metaphor Detection Shared Task},
  author    = {Leong, Chee Wee and Beigman Klebanov, Beata and Shutova, Ekaterina},
  booktitle = {Proceedings of the Workshop on Figurative Language Processing at NAACL-HLT 2018},
  year      = {2018}
}

Base model: Microsoft DeBERTa

This model:

@misc{leo2025metalingoindirectmetaphor,
  title        = {metalingo-indirect-metaphor: Two-Stage Knowledge Distillation for Metaphor Detection},
  author       = {Leo, Tommy},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/tommyleo2077/metalingo-indirect-metaphor}},
  note         = {Contact: 1683619168tl@gmail.com}
}

License

Apache License 2.0 — see LICENSE for details.

Downloads last month
7
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tommyleo2077/metalingo-indirect-metaphor

Finetuned
(279)
this model

Collection including tommyleo2077/metalingo-indirect-metaphor