mdeberta-v3-en-wanli-nli

Fine-tuned microsoft/mdeberta-v3-base on the WANLI dataset for natural language inference.

Task: natural language inference (Zero-Shot Classification)
Language: English
License: CC BY 4.0
Base model: microsoft/mdeberta-v3-base

Usage

from transformers import pipeline

model_id = "takehika/mdeberta-v3-en-wanli-nli"
classifier = pipeline("zero-shot-classification", model=model_id)

text = "California's sunny weather and diverse theme parks make it a perfect place for family vacations."
labels = ["Weather", "Environment", "Entertainment", "Economy", "Politics"]

output = classifier(text, labels, multi_label=False)
print(output)

Data

Dataset: WANLI (alisawuffles/WANLI)
Train data: 102,885 examples
Validation data: 5,000 examples (WANLI test split)

Training

Base: microsoft/mdeberta-v3-base
Epochs: 2
Learning rate: 2e-5
Warmup ratio: 0.1
Batch size: 4 per device, grad accumulation 2 (effective 8)
Evaluation/save: every 2000 steps
Best model selection: f1

Evaluation

Accuracy: 0.7316
F1: 0.7309

Intended Use & Limitations

Intended for English NLI.
Inputs are limited to 512 tokens (longer pairs are truncated).
Domain shifts or adversarial examples can reduce performance.

Attribution & Licenses

License: CC BY 4.0
Base model: microsoft/mdeberta-v3-base by Microsoft - MIT License
- Model card: https://huggingface.co/microsoft/mdeberta-v3-base
Dataset: WANLI (alisawuffles/WANLI) - CC BY 4.0
- Dataset card: https://huggingface.co/datasets/alisawuffles/WANLI

This model modifies the base model by fine-tuning on the above dataset.

Base Model Citation

Please cite the following when using the DeBERTa base model:

@misc{he2021debertav3,
      title={DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing}, 
      author={Pengcheng He and Jianfeng Gao and Weizhu Chen},
      year={2021},
      eprint={2111.09543},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@inproceedings{
he2021deberta,
title={DEBERTA: DECODING-ENHANCED BERT WITH DISENTANGLED ATTENTION},
author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
booktitle={International Conference on Learning Representations},
year={2021},
url={https://openreview.net/forum?id=XPZIaotutsD}
}

Dataset Citation

Please cite the following when using the WANLI dataset:

@misc{liu-etal-2022-wanli,
    title = "WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation",
    author = "Liu, Alisa  and
      Swayamdipta, Swabha  and
      Smith, Noah A.  and
      Choi, Yejin",
    month = jan,
    year = "2022",
    url = "https://arxiv.org/pdf/2201.05955",
}

Downloads last month: 17

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for takehika/mdeberta-v3-en-wanli-nli

Base model

microsoft/mdeberta-v3-base

Finetuned

(262)

this model

Dataset used to train takehika/mdeberta-v3-en-wanli-nli

Papers for takehika/mdeberta-v3-en-wanli-nli

WANLI: Worker and AI Collaboration for Natural Language Inference Dataset Creation

Paper • 2201.05955 • Published Jan 16, 2022

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

Paper • 2111.09543 • Published Nov 18, 2021 • 3