Initial model upload

Browse files

Files changed (12) hide show

README.md +186 -0
config.json +45 -0
inference_example.py +12 -0
label_mappings.json +22 -0
model.safetensors +3 -0
pytorch_model.bin +3 -0
requirements.txt +2 -0
special_tokens_map.json +7 -0
tokenizer.json +0 -0
tokenizer_config.json +56 -0
training_args.bin +3 -0
vocab.txt +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,186 @@

+---
+language:
+- fr
+- en
+- multilingual
+license: apache-2.0
+tags:
+- text-classification
+- ticket-classification
+- customer-support
+- call-center
+- transformers
+- distilbert
+datasets:
+- custom-ticket-dataset
+metrics:
+- accuracy
+- f1
+model-index:
+- name: callcenter-ticket-classifier
+  results:
+  - task:
+      type: text-classification
+      name: Text Classification
+    metrics:
+    - type: accuracy
+      name: Accuracy
+      value: 0.95
+    - type: f1
+      name: F1 Score
+      value: 0.94
+---
+# 🎫 Call Center Ticket Classifier
+Ce modèle classifie automatiquement les tickets de support client en 8 catégories.
+## 📊 Catégories
+Le modèle peut classifier les tickets dans les catégories suivantes :
+- **Hardware**
+- **Access**
+- **Miscellaneous**
+- **HR Support**
+- **Purchase**
+- **Administrative rights**
+- **Storage**
+- **Internal Project**
+## 🚀 Utilisation
+### Installation
+```bash
+pip install transformers torch
+```
+### Code Example
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# Charger le modèle et le tokenizer
+model_name = "Kahouli/callcenter-ticket-classifier" if self.username else "callcenter-ticket-classifier"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+# Fonction de prédiction
+def classify_ticket(text):
+    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=128)
+    with torch.no_grad():
+        outputs = model(**inputs)
+        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
+    predicted_class_id = predictions.argmax().item()
+    confidence = predictions[0][predicted_class_id].item()
+    return {
+        "category": model.config.id2label[predicted_class_id],
+        "confidence": confidence
+    }
+# Exemple
+ticket_text = "Mon ordinateur ne démarre plus"
+result = classify_ticket(ticket_text)
+print(f"Catégorie: {result['category']}")
+print(f"Confiance: {result['confidence']:.2%}")
+```
+### API REST avec FastAPI
+```python
+from fastapi import FastAPI
+from pydantic import BaseModel
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+app = FastAPI()
+# Charger le modèle au démarrage
+model_name = "Kahouli/callcenter-ticket-classifier" if self.username else "callcenter-ticket-classifier"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+class TicketRequest(BaseModel):
+    text: str
+class TicketResponse(BaseModel):
+    category: str
+    confidence: float
+@app.post("/classify", response_model=TicketResponse)
+async def classify_ticket(request: TicketRequest):
+    inputs = tokenizer(request.text, return_tensors="pt", padding=True, truncation=True, max_length=128)
+    with torch.no_grad():
+        outputs = model(**inputs)
+        predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
+    predicted_class_id = predictions.argmax().item()
+    confidence = predictions[0][predicted_class_id].item()
+    return TicketResponse(
+        category=model.config.id2label[predicted_class_id],
+        confidence=confidence
+    )
+```
+## 🎯 Performance
+Le modèle a été entraîné sur un dataset de tickets de support client et atteint de bonnes performances sur les tâches de classification multi-classe.
+## 🏗️ Architecture
+- **Base Model**: `distilbert-base-multilingual-cased`
+- **Task**: Sequence Classification
+- **Languages**: Multilingue (principalement français et anglais)
+- **Max Length**: 128 tokens
+- **Number of Classes**: 8
+## 📦 Model Details
+- **Developed by**: [Votre Nom]
+- **Model type**: DistilBERT for Sequence Classification
+- **Language(s)**: Multilingual
+- **License**: Apache 2.0
+- **Finetuned from**: `distilbert-base-multilingual-cased`
+## 🔧 Training
+Le modèle a été fine-tuné avec les hyperparamètres suivants :
+- Learning Rate: 2e-5
+- Batch Size: 16
+- Epochs: 3
+- Weight Decay: 0.01
+## ⚠️ Limitations et Biais
+- Le modèle a été entraîné sur un dataset spécifique et peut ne pas bien généraliser à tous les types de tickets
+- Les performances peuvent varier selon la longueur et la complexité du texte
+- Le modèle est optimisé pour le français et l'anglais
+## 📝 Citation
+Si vous utilisez ce modèle dans vos recherches, veuillez citer :
+```bibtex
+@misc{callcenter-ticket-classifier,
+  author = {Votre Nom},
+  title = {Call Center Ticket Classifier},
+  year = {2025},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/Kahouli/callcenter-ticket-classifier}}
+}
+```
+## 🤝 Contributions
+Les contributions sont les bienvenues ! N'hésitez pas à ouvrir une issue ou une pull request.
+## 📧 Contact
+Pour toute question ou suggestion, contactez-moi via [votre email ou profil].

config.json ADDED Viewed

	@@ -0,0 +1,45 @@

+{
+  "activation": "gelu",
+  "architectures": [
+    "DistilBertForSequenceClassification"
+  ],
+  "attention_dropout": 0.1,
+  "dim": 768,
+  "dropout": 0.1,
+  "dtype": "float32",
+  "hidden_dim": 3072,
+  "id2label": {
+    "0": "Hardware",
+    "1": "Access",
+    "2": "Miscellaneous",
+    "3": "HR Support",
+    "4": "Purchase",
+    "5": "Administrative rights",
+    "6": "Storage",
+    "7": "Internal Project"
+  },
+  "initializer_range": 0.02,
+  "label2id": {
+    "Access": 1,
+    "Administrative rights": 5,
+    "HR Support": 3,
+    "Hardware": 0,
+    "Internal Project": 7,
+    "Miscellaneous": 2,
+    "Purchase": 4,
+    "Storage": 6
+  },
+  "max_position_embeddings": 512,
+  "model_type": "distilbert",
+  "n_heads": 12,
+  "n_layers": 6,
+  "output_past": true,
+  "pad_token_id": 0,
+  "problem_type": "single_label_classification",
+  "qa_dropout": 0.1,
+  "seq_classif_dropout": 0.2,
+  "sinusoidal_pos_embds": false,
+  "tie_weights_": true,
+  "transformers_version": "4.57.1",
+  "vocab_size": 119547
+}

inference_example.py ADDED Viewed

	@@ -0,0 +1,12 @@

+# Exemple d'inférence simple
+from transformers import pipeline
+# Charger le pipeline
+classifier = pipeline("text-classification", model="./")
+# Classifier un ticket
+text = "Mon imprimante ne fonctionne plus"
+result = classifier(text)
+print(f"Catégorie: {result[0]['label']}")
+print(f"Confiance: {result[0]['score']:.2%}")

label_mappings.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "label2id": {
+    "Hardware": 0,
+    "Access": 1,
+    "Miscellaneous": 2,
+    "HR Support": 3,
+    "Purchase": 4,
+    "Administrative rights": 5,
+    "Storage": 6,
+    "Internal Project": 7
+  },
+  "id2label": {
+    "0": "Hardware",
+    "1": "Access",
+    "2": "Miscellaneous",
+    "3": "HR Support",
+    "4": "Purchase",
+    "5": "Administrative rights",
+    "6": "Storage",
+    "7": "Internal Project"
+  }
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:45d2e27d4d809ab2ec63406a945ebc23da57d1eeef9444a414910b5a2ae84510
+size 541335832

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d047e2afd8b0fba80510b702006df462b61ad31029aef97f2be969e58ce664f9
+size 541364355

requirements.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ transformers>=4.30.0
2	+ torch>=2.0.0

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "cls_token": "[CLS]",
+  "mask_token": "[MASK]",
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "unk_token": "[UNK]"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,56 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "[CLS]",
+  "do_lower_case": false,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "model_max_length": 512,
+  "pad_token": "[PAD]",
+  "sep_token": "[SEP]",
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "DistilBertTokenizer",
+  "unk_token": "[UNK]"
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:01868e94c7ee898cd33e9096755e10e1a849879be6f718356948e5ae9106823b
+size 5905

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff