MINT-empathy-Qwen3-1.7B

MINT (Multi-turn Inter-tactic Novelty Training) model for empathic dialogue, fine-tuned from Qwen/Qwen3-1.7B.

This is the Q+D_KL variant (Quality + KL-divergence tactic diversity reward), the best-performing MINT configuration from the paper.

What is MINT?

MINT is a reinforcement learning framework that trains empathic dialogue models to diversify their discourse moves across conversation turns. Most models lock into repetitive empathy tactics (e.g., always validating emotions); MINT combines an empathy quality reward with a cross-turn tactic novelty signal via GRPO to break this pattern.

Trained on 322 multi-turn emotional support conversations and evaluated on the Lend-an-Ear framework across 6 empathy dimensions.

Training

Method GRPO (Group Relative Policy Optimization) via VERL
Reward Quality (PsychoCounsel) + Tactic Diversity (KL divergence)
Base model Qwen/Qwen3-1.7B
KL coeff 0.01
Diversity weight 1.0
Response length 2048 tokens
Rollouts n=8 per prompt

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("hongli-zhan/MINT-empathy-Qwen3-1.7B")
tokenizer = AutoTokenizer.from_pretrained("hongli-zhan/MINT-empathy-Qwen3-1.7B")

With vLLM:

from vllm import LLM
llm = LLM(model="hongli-zhan/MINT-empathy-Qwen3-1.7B")

Citation

@article{zhan2026discourse,
  title={Discourse Diversity in Multi-Turn Empathic Dialogue},
  author={Zhan, Hongli and Gueorguieva, Emma S and Hernandez, Javier and Suh, Jina and Ong, Desmond C and Li, Junyi Jessy},
  journal={arXiv preprint arXiv:2604.11742},
  year={2026}
}

Project Page | Code

Downloads last month
892
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hongli-zhan/MINT-empathy-Qwen3-1.7B

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(617)
this model
Quantizations
1 model

Paper for hongli-zhan/MINT-empathy-Qwen3-1.7B