MINT-empathy-Qwen3-1.7B

MINT (Multi-turn Inter-tactic Novelty Training) model for empathic dialogue, fine-tuned from Qwen/Qwen3-1.7B.

This is the Q+D_KL variant (Quality + KL-divergence tactic diversity reward), the best-performing MINT configuration from the paper.

What is MINT?

MINT is a reinforcement learning framework that trains empathic dialogue models to diversify their discourse moves across conversation turns. Most models lock into repetitive empathy tactics (e.g., always validating emotions); MINT combines an empathy quality reward with a cross-turn tactic novelty signal via GRPO to break this pattern.

Trained on 322 multi-turn emotional support conversations and evaluated on the Lend-an-Ear framework across 6 empathy dimensions.

Training


Method	GRPO (Group Relative Policy Optimization) via VERL
Reward	Quality (PsychoCounsel) + Tactic Diversity (KL divergence)
Base model	Qwen/Qwen3-1.7B
KL coeff	0.01
Diversity weight	1.0
Response length	2048 tokens
Rollouts	n=8 per prompt

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("hongli-zhan/MINT-empathy-Qwen3-1.7B")
tokenizer = AutoTokenizer.from_pretrained("hongli-zhan/MINT-empathy-Qwen3-1.7B")

With vLLM:

from vllm import LLM
llm = LLM(model="hongli-zhan/MINT-empathy-Qwen3-1.7B")

Citation

@article{zhan2026discourse,
  title={Discourse Diversity in Multi-Turn Empathic Dialogue},
  author={Zhan, Hongli and Gueorguieva, Emma S and Hernandez, Javier and Suh, Jina and Ong, Desmond C and Li, Junyi Jessy},
  journal={arXiv preprint arXiv:2604.11742},
  year={2026}
}

Project Page | Code