canpolatbulbul commited on
Commit
743eda6
·
verified ·
1 Parent(s): e0edbb8

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +13 -36
README.md CHANGED
@@ -30,60 +30,37 @@ Evaluated via **5-fold document-level cross-validation** on 80 Turkish banking c
30
  | Micro-F1 (CV) | **0.6657** |
31
  | Macro-F1 (CV) | **0.6226** |
32
 
33
- ### Per-Class F1
34
 
35
  | Category | F1 |
36
  |----------|----|
37
  | hidden_fees | 0.76 |
 
38
  | broad_collateral | 0.74 |
39
- | unilateral_rate_change | 0.72 |
40
  | default_escalation | 0.71 |
41
- | data_sharing | 0.69 |
42
- | currency_risk | 0.68 |
43
  | account_freeze | 0.65 |
44
- | early_payment_penalty | 0.64 |
45
- | dispute_limitation | 0.61 |
46
- | bundled_insurance | 0.59 |
47
  | unilateral_terms_change | 0.57 |
48
- | overdraft_penalty | 0.51 |
49
- | cross_default | 0.44 |
50
  | auto_enrollment | 0.42 |
 
 
 
 
51
 
52
  ## Training
53
 
54
- - **Base:** dbmdz/bert-base-turkish-cased
55
  - **Loss:** Multi-label Focal Loss (gamma=2.0, alpha=0.75)
56
- - **Optimizer:** AdamW, lr=2e-5, weight_decay=0.01, epochs=10
57
- - **Data:** 80 Turkish banking contracts, 7,020 clauses, 18 banks
 
58
  - **Threshold:** Fixed 0.5 per label
59
 
60
  ## Usage
61
 
62
- ```python
63
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
64
- import torch
65
-
66
- LABELS = [
67
- "hidden_fees", "dispute_limitation", "broad_collateral", "default_escalation",
68
- "account_freeze", "currency_risk", "unilateral_terms_change", "unilateral_rate_change",
69
- "data_sharing", "auto_enrollment", "cross_default", "overdraft_penalty",
70
- "early_payment_penalty", "bundled_insurance",
71
- ]
72
-
73
- tokenizer = AutoTokenizer.from_pretrained("Agreemind/banking-bert-turkish")
74
- model = AutoModelForSequenceClassification.from_pretrained("Agreemind/banking-bert-turkish")
75
- model.eval()
76
-
77
- text = "Banka, hesap işletim ücretini önceden bildirmeksizin değiştirme hakkını saklı tutar."
78
- inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
79
-
80
- with torch.no_grad():
81
- probs = torch.sigmoid(model(**inputs).logits).squeeze()
82
 
83
- for label, prob in zip(LABELS, probs):
84
- if prob > 0.5:
85
- print(f"{label}: {prob:.3f}")
86
- ```
87
 
88
  ## License
89
  MIT
 
30
  | Micro-F1 (CV) | **0.6657** |
31
  | Macro-F1 (CV) | **0.6226** |
32
 
33
+ ### Per-class F1 (5-fold CV)
34
 
35
  | Category | F1 |
36
  |----------|----|
37
  | hidden_fees | 0.76 |
38
+ | dispute_limitation | 0.61 |
39
  | broad_collateral | 0.74 |
 
40
  | default_escalation | 0.71 |
 
 
41
  | account_freeze | 0.65 |
42
+ | currency_risk | 0.68 |
 
 
43
  | unilateral_terms_change | 0.57 |
44
+ | unilateral_rate_change | 0.72 |
45
+ | data_sharing | 0.69 |
46
  | auto_enrollment | 0.42 |
47
+ | cross_default | 0.44 |
48
+ | overdraft_penalty | 0.51 |
49
+ | early_payment_penalty | 0.64 |
50
+ | bundled_insurance | 0.59 |
51
 
52
  ## Training
53
 
54
+ - **Base model:** dbmdz/bert-base-turkish-cased
55
  - **Loss:** Multi-label Focal Loss (gamma=2.0, alpha=0.75)
56
+ - **Optimizer:** AdamW, lr=2e-5, weight_decay=0.01
57
+ - **Epochs:** 10
58
+ - **Training data:** 80 Turkish banking contract PDFs, 7,020 clauses (Akbank, Garanti, İşbank, DenizBank, Halkbank, VakıfBank, QNB, TEB, YapıKredi, Ziraat, ING, KuveytTürk, Şekerbank, HSBC Turkey, Odea Bank, Albaraka Türk, Türkiye Finans, Burgan Bank and more)
59
  - **Threshold:** Fixed 0.5 per label
60
 
61
  ## Usage
62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
 
 
 
 
64
 
65
  ## License
66
  MIT