mdeberta-v3-en-wanli-nli

Fine-tuned microsoft/mdeberta-v3-base on the WANLI dataset for natural language inference.

  • Task: natural language inference (Zero-Shot Classification)
  • Language: English
  • License: CC BY 4.0
  • Base model: microsoft/mdeberta-v3-base

Usage

from transformers import pipeline

model_id = "takehika/mdeberta-v3-en-wanli-nli"
classifier = pipeline("zero-shot-classification", model=model_id)

text = "California's sunny weather and diverse theme parks make it a perfect place for family vacations."
labels = ["Weather", "Environment", "Entertainment", "Economy", "Politics"]

output = classifier(text, labels, multi_label=False)
print(output)

Data

  • Dataset: WANLI (alisawuffles/WANLI)
  • Train data: 102,885 examples
  • Validation data: 5,000 examples (WANLI test split)

Training

  • Base: microsoft/mdeberta-v3-base
  • Epochs: 2
  • Learning rate: 2e-5
  • Warmup ratio: 0.1
  • Batch size: 4 per device, grad accumulation 2 (effective 8)
  • Evaluation/save: every 2000 steps
  • Best model selection: f1

Evaluation

  • Accuracy: 0.7316
  • F1: 0.7309

Intended Use & Limitations

  • Intended for English NLI.
  • Inputs are limited to 512 tokens (longer pairs are truncated).
  • Domain shifts or adversarial examples can reduce performance.

Attribution & Licenses

This model modifies the base model by fine-tuning on the above dataset.

Base Model Citation

Please cite the following when using the DeBERTa base model:

@misc{he2021debertav3,
      title={DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing}, 
      author={Pengcheng He and Jianfeng Gao and Weizhu Chen},
      year={2021},
      eprint={2111.09543},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
@inproceedings{
he2021deberta,
title={DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION},
author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=XPZIaotutsD}
}

Dataset Citation

Please cite the following when using the WANLI dataset:

@misc{liu-etal-2022-wanli,
    title = "WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation",
    author = "Liu, Alisa  and
      Swayamdipta, Swabha  and
      Smith, Noah A.  and
      Choi, Yejin",
    month = jan,
    year = "2022",
    url = "https://arxiv.org/pdf/2201.05955",
}
Downloads last month
17
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for takehika/mdeberta-v3-en-wanli-nli

Finetuned
(262)
this model

Dataset used to train takehika/mdeberta-v3-en-wanli-nli

Papers for takehika/mdeberta-v3-en-wanli-nli