Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -30,60 +30,37 @@ Evaluated via **5-fold document-level cross-validation** on 80 Turkish banking c
|
|
| 30 |
| Micro-F1 (CV) | **0.6657** |
|
| 31 |
| Macro-F1 (CV) | **0.6226** |
|
| 32 |
|
| 33 |
-
### Per-
|
| 34 |
|
| 35 |
| Category | F1 |
|
| 36 |
|----------|----|
|
| 37 |
| hidden_fees | 0.76 |
|
|
|
|
| 38 |
| broad_collateral | 0.74 |
|
| 39 |
-
| unilateral_rate_change | 0.72 |
|
| 40 |
| default_escalation | 0.71 |
|
| 41 |
-
| data_sharing | 0.69 |
|
| 42 |
-
| currency_risk | 0.68 |
|
| 43 |
| account_freeze | 0.65 |
|
| 44 |
-
|
|
| 45 |
-
| dispute_limitation | 0.61 |
|
| 46 |
-
| bundled_insurance | 0.59 |
|
| 47 |
| unilateral_terms_change | 0.57 |
|
| 48 |
-
|
|
| 49 |
-
|
|
| 50 |
| auto_enrollment | 0.42 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
## Training
|
| 53 |
|
| 54 |
-
- **Base:** dbmdz/bert-base-turkish-cased
|
| 55 |
- **Loss:** Multi-label Focal Loss (gamma=2.0, alpha=0.75)
|
| 56 |
-
- **Optimizer:** AdamW, lr=2e-5, weight_decay=0.01
|
| 57 |
-
- **
|
|
|
|
| 58 |
- **Threshold:** Fixed 0.5 per label
|
| 59 |
|
| 60 |
## Usage
|
| 61 |
|
| 62 |
-
```python
|
| 63 |
-
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
| 64 |
-
import torch
|
| 65 |
-
|
| 66 |
-
LABELS = [
|
| 67 |
-
"hidden_fees", "dispute_limitation", "broad_collateral", "default_escalation",
|
| 68 |
-
"account_freeze", "currency_risk", "unilateral_terms_change", "unilateral_rate_change",
|
| 69 |
-
"data_sharing", "auto_enrollment", "cross_default", "overdraft_penalty",
|
| 70 |
-
"early_payment_penalty", "bundled_insurance",
|
| 71 |
-
]
|
| 72 |
-
|
| 73 |
-
tokenizer = AutoTokenizer.from_pretrained("Agreemind/banking-bert-turkish")
|
| 74 |
-
model = AutoModelForSequenceClassification.from_pretrained("Agreemind/banking-bert-turkish")
|
| 75 |
-
model.eval()
|
| 76 |
-
|
| 77 |
-
text = "Banka, hesap işletim ücretini önceden bildirmeksizin değiştirme hakkını saklı tutar."
|
| 78 |
-
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
|
| 79 |
-
|
| 80 |
-
with torch.no_grad():
|
| 81 |
-
probs = torch.sigmoid(model(**inputs).logits).squeeze()
|
| 82 |
|
| 83 |
-
for label, prob in zip(LABELS, probs):
|
| 84 |
-
if prob > 0.5:
|
| 85 |
-
print(f"{label}: {prob:.3f}")
|
| 86 |
-
```
|
| 87 |
|
| 88 |
## License
|
| 89 |
MIT
|
|
|
|
| 30 |
| Micro-F1 (CV) | **0.6657** |
|
| 31 |
| Macro-F1 (CV) | **0.6226** |
|
| 32 |
|
| 33 |
+
### Per-class F1 (5-fold CV)
|
| 34 |
|
| 35 |
| Category | F1 |
|
| 36 |
|----------|----|
|
| 37 |
| hidden_fees | 0.76 |
|
| 38 |
+
| dispute_limitation | 0.61 |
|
| 39 |
| broad_collateral | 0.74 |
|
|
|
|
| 40 |
| default_escalation | 0.71 |
|
|
|
|
|
|
|
| 41 |
| account_freeze | 0.65 |
|
| 42 |
+
| currency_risk | 0.68 |
|
|
|
|
|
|
|
| 43 |
| unilateral_terms_change | 0.57 |
|
| 44 |
+
| unilateral_rate_change | 0.72 |
|
| 45 |
+
| data_sharing | 0.69 |
|
| 46 |
| auto_enrollment | 0.42 |
|
| 47 |
+
| cross_default | 0.44 |
|
| 48 |
+
| overdraft_penalty | 0.51 |
|
| 49 |
+
| early_payment_penalty | 0.64 |
|
| 50 |
+
| bundled_insurance | 0.59 |
|
| 51 |
|
| 52 |
## Training
|
| 53 |
|
| 54 |
+
- **Base model:** dbmdz/bert-base-turkish-cased
|
| 55 |
- **Loss:** Multi-label Focal Loss (gamma=2.0, alpha=0.75)
|
| 56 |
+
- **Optimizer:** AdamW, lr=2e-5, weight_decay=0.01
|
| 57 |
+
- **Epochs:** 10
|
| 58 |
+
- **Training data:** 80 Turkish banking contract PDFs, 7,020 clauses (Akbank, Garanti, İşbank, DenizBank, Halkbank, VakıfBank, QNB, TEB, YapıKredi, Ziraat, ING, KuveytTürk, Şekerbank, HSBC Turkey, Odea Bank, Albaraka Türk, Türkiye Finans, Burgan Bank and more)
|
| 59 |
- **Threshold:** Fixed 0.5 per label
|
| 60 |
|
| 61 |
## Usage
|
| 62 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
|
| 65 |
## License
|
| 66 |
MIT
|