Discourse Diversity in Multi-Turn Empathic Dialogue
Paper • 2604.11742 • Published • 1
MINT (Multi-turn Inter-tactic Novelty Training) model for empathic dialogue, fine-tuned from Qwen/Qwen3-1.7B.
This is the Q+D_KL variant (Quality + KL-divergence tactic diversity reward), the best-performing MINT configuration from the paper.
MINT is a reinforcement learning framework that trains empathic dialogue models to diversify their discourse moves across conversation turns. Most models lock into repetitive empathy tactics (e.g., always validating emotions); MINT combines an empathy quality reward with a cross-turn tactic novelty signal via GRPO to break this pattern.
Trained on 322 multi-turn emotional support conversations and evaluated on the Lend-an-Ear framework across 6 empathy dimensions.
| Method | GRPO (Group Relative Policy Optimization) via VERL |
| Reward | Quality (PsychoCounsel) + Tactic Diversity (KL divergence) |
| Base model | Qwen/Qwen3-1.7B |
| KL coeff | 0.01 |
| Diversity weight | 1.0 |
| Response length | 2048 tokens |
| Rollouts | n=8 per prompt |
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("hongli-zhan/MINT-empathy-Qwen3-1.7B")
tokenizer = AutoTokenizer.from_pretrained("hongli-zhan/MINT-empathy-Qwen3-1.7B")
With vLLM:
from vllm import LLM
llm = LLM(model="hongli-zhan/MINT-empathy-Qwen3-1.7B")
@article{zhan2026discourse,
title={Discourse Diversity in Multi-Turn Empathic Dialogue},
author={Zhan, Hongli and Gueorguieva, Emma S and Hernandez, Javier and Suh, Jina and Ong, Desmond C and Li, Junyi Jessy},
journal={arXiv preprint arXiv:2604.11742},
year={2026}
}