--- license: apache-2.0 language: - en - es - fr - de - zh - ja - ko - ar - pt - ru - hi - multilingual library_name: transformers tags: - text-classification - feedback-detection - user-satisfaction - mmbert - modernbert - multilingual - vllm-semantic-router datasets: - llm-semantic-router/feedback-detector-dataset metrics: - accuracy - f1 base_model: jhu-clsp/mmBERT-base pipeline_tag: text-classification model-index: - name: mmbert-feedback-detector-merged results: - task: type: text-classification name: User Feedback Classification dataset: name: feedback-detector-dataset type: llm-semantic-router/feedback-detector-dataset metrics: - type: accuracy value: 0.9689 name: Accuracy - type: f1 value: 0.9688 name: F1 Macro --- # mmBERT Feedback Detector (Merged) A **multilingual** 4-class user feedback classifier built on [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base). This model classifies user responses into satisfaction categories to help understand user intent in conversational AI systems. ## Model Description This is the **merged model** (LoRA weights merged into base model) for direct inference without PEFT. For the LoRA adapter version, see [llm-semantic-router/mmbert-feedback-detector-lora](https://huggingface.co/llm-semantic-router/mmbert-feedback-detector-lora). ### Labels | Label | ID | Description | |-------|-----|-------------| | `SAT` | 0 | User is satisfied with the response | | `NEED_CLARIFICATION` | 1 | User needs more explanation or clarification | | `WRONG_ANSWER` | 2 | User indicates the response is incorrect | | `WANT_DIFFERENT` | 3 | User wants alternative options or different response | ## Performance | Metric | Score | |--------|-------| | **Accuracy** | 96.89% | | **F1 Macro** | 96.88% | | **F1 Weighted** | 96.88% | ### Per-Class Performance | Class | F1 Score | |-------|----------| | SAT | 100.0% | | NEED_CLARIFICATION | 99.7% | | WRONG_ANSWER | 94.0% | | WANT_DIFFERENT | 93.8% | ## Multilingual Support Thanks to mmBERT's multilingual pretraining (256k vocabulary, 100+ languages), this model achieves excellent cross-lingual transfer: | Language | Accuracy | |----------|----------| | 🇺🇸 English | 100% | | 🇪🇸 Spanish | 100% | | 🇫🇷 French | 100% | | 🇩🇪 German | 100% | | 🇨🇳 Chinese | 100% | | 🇯🇵 Japanese | 100% | | 🇰🇷 Korean | 100% | | 🇸🇦 Arabic | 100% | | 🇵🇹 Portuguese | 100% | | 🇷🇺 Russian | 100% | ## Usage ### With Transformers ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer import torch model_name = "llm-semantic-router/mmbert-feedback-detector-merged" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # Example: Classify user feedback text = "Thanks, that's exactly what I needed!" inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) with torch.no_grad(): outputs = model(**inputs) probs = torch.softmax(outputs.logits, dim=-1) pred = probs.argmax().item() labels = ["SAT", "NEED_CLARIFICATION", "WRONG_ANSWER", "WANT_DIFFERENT"] print(f"Prediction: {labels[pred]} ({probs[0][pred]:.1%})") # Output: Prediction: SAT (100.0%) ``` ### With Pipeline ```python from transformers import pipeline classifier = pipeline( "text-classification", model="llm-semantic-router/mmbert-feedback-detector-merged" ) # English result = classifier("Thanks, that's helpful!") print(result) # [{'label': 'SAT', 'score': 0.999...}] # Spanish (cross-lingual transfer) result = classifier("¡Gracias, eso es muy útil!") print(result) # [{'label': 'SAT', 'score': 0.999...}] # Chinese result = classifier("谢谢,这很有帮助!") print(result) # [{'label': 'SAT', 'score': 0.98...}] ``` ## Training Details - **Base Model**: [jhu-clsp/mmBERT-base](https://huggingface.co/jhu-clsp/mmBERT-base) - **Method**: LoRA fine-tuning + merge - **LoRA Rank**: 16 - **LoRA Alpha**: 32 - **Learning Rate**: 2e-5 - **Batch Size**: 32 - **Epochs**: 5 - **Max Length**: 512 - **Dataset**: [llm-semantic-router/feedback-detector-dataset](https://huggingface.co/datasets/llm-semantic-router/feedback-detector-dataset) ## Use Cases - **Conversational AI**: Understand if users are satisfied with chatbot responses - **Customer Support**: Route dissatisfied users to human agents - **Quality Monitoring**: Track response quality across languages - **Feedback Analysis**: Categorize user feedback automatically ## Related Models - [llm-semantic-router/mmbert-feedback-detector-lora](https://huggingface.co/llm-semantic-router/mmbert-feedback-detector-lora) - LoRA adapter version - [llm-semantic-router/mmbert-intent-classifier-merged](https://huggingface.co/llm-semantic-router/mmbert-intent-classifier-merged) - Intent classification - [llm-semantic-router/mmbert-fact-check-merged](https://huggingface.co/llm-semantic-router/mmbert-fact-check-merged) - Fact checking - [llm-semantic-router/mmbert-jailbreak-detector-merged](https://huggingface.co/llm-semantic-router/mmbert-jailbreak-detector-merged) - Security ## Citation ```bibtex @misc{mmbert-feedback-detector, title={mmBERT Feedback Detector}, author={vLLM Semantic Router Team}, year={2025}, publisher={Hugging Face}, url={https://huggingface.co/llm-semantic-router/mmbert-feedback-detector-merged} } ``` ## License Apache 2.0