Text Generation
GGUF
English
Hindi
legal
unsloth
turboquant
edge-ai
slm
conversational
Bhishaj commited on
Commit
cfd36ca
·
verified ·
1 Parent(s): 9ed8a8a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -21
README.md CHANGED
@@ -9,34 +9,64 @@ tags:
9
  - turboquant
10
  - gguf
11
  - edge-ai
 
 
12
  datasets:
13
  - Techmaestro369/indian-legal-texts-finetuning
14
  - bharatgenai/BhashaBench-Legal
 
 
15
  ---
16
 
17
  # ⚖️ Vidhik AI: Sovereign Legal SLM (1B)
18
 
19
- ## Model Summary
20
- Vidhik AI is a highly optimized, domain-specific Small Language Model (SLM) engineered for the Indian Judiciary and MSME sector. Fine-tuned on a 1B parameter base, it specializes in drafting formal legal notices (e.g., MSMED Act delayed payments) and navigating complex Indian officialese.
 
 
21
 
22
- **Developer:** Bhishaj Technologies (Gaurav)
23
- **Base Model:** Llama-3.2-1B-Instruct
24
- **Quantization:** 4-bit GGUF (Q4_K_M)
 
 
 
 
 
 
 
 
25
 
26
  ## 🛠️ Training & MLOps Architecture
27
- To bypass local hardware constraints, the model was trained using a hybrid cloud-edge pipeline:
28
- * **Compute:** Kaggle Dual T4 GPUs (32GB VRAM)
29
- * **Optimization:** Unsloth for 70% VRAM reduction during fine-tuning.
30
- * **Method:** PEFT/QLoRA instruction fine-tuning on `indian-legal-texts-finetuning`.
31
- * **Guardrails:** Model is trained with strict negative stop-sequences and deterministic decoding (`Temperature = 0.0`) to prevent MCQ-loop hallucinations.
32
 
33
- ## Edge Deployment & Google TurboQuant
34
- This model is specifically compiled to run on legacy/constrained hardware (e.g., NVIDIA GTX 1050 4GB).
 
 
 
 
 
 
 
 
 
 
 
 
35
 
36
- By utilizing **Google TurboQuant**, the model compresses the KV-cache to 3-bits during runtime, allowing for 128k context windows (essential for long Indian government gazettes) without triggering OOM (Out of Memory) crashes, maintaining a throughput of ~24.5 tokens/sec.
 
 
 
 
 
 
37
 
38
- ### Python Usage (TurboQuant Enabled)
39
- \```python
 
 
 
40
  import torch
41
  from transformers import AutoModelForCausalLM, AutoTokenizer
42
  from turboquant import TurboQuantCache
@@ -51,16 +81,13 @@ tq_cache = TurboQuantCache(bits=4, compute_device="cuda")
51
  prompt = "TASK: Draft a formal legal notice for my client 'M/s Vidhik Electronics' under MSMED Act Sections 15 & 16."
52
  inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
53
 
 
54
  with torch.no_grad():
55
  outputs = model.generate(
56
  **inputs,
57
- past_key_values=tq_cache,
58
  max_new_tokens=512,
59
  temperature=0.0
60
  )
61
 
62
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
63
- \```
64
-
65
- ## 📊 Evaluation
66
- Evaluated against **BhashaBench-Legal (BBL)** to ensure alignment with Indian judicial service standards and formal legal tonality.
 
9
  - turboquant
10
  - gguf
11
  - edge-ai
12
+ - slm
13
+ - text-generation
14
  datasets:
15
  - Techmaestro369/indian-legal-texts-finetuning
16
  - bharatgenai/BhashaBench-Legal
17
+ - coild-aikosh/Judiciary_v2
18
+ base_model: meta-llama/Llama-3.2-1B-Instruct
19
  ---
20
 
21
  # ⚖️ Vidhik AI: Sovereign Legal SLM (1B)
22
 
23
+ ![Model Size](https://img.shields.io/badge/Parameters-1B-blue)
24
+ ![Quantization](https://img.shields.io/badge/Quantization-GGUF_Q4__K__M-orange)
25
+ ![Hardware](https://img.shields.io/badge/Optimized_For-4GB_VRAM-green)
26
+ ![Framework](https://img.shields.io/badge/Framework-Unsloth_%7C_Transformers-red)
27
 
28
+ ## 📌 Model Summary
29
+ **Vidhik AI** is a highly optimized, domain-specific Small Language Model (SLM) engineered for the Indian Judiciary and MSME sector. Fine-tuned on a 1B parameter base, it specializes in drafting formal legal notices (e.g., MSMED Act delayed payments), analyzing case law, and navigating complex Indian officialese ("Babu-speak").
30
+
31
+ Built with a focus on **Edge Compute**, this model is designed to run locally on highly constrained hardware (like a 4GB GTX 1050) while retaining the ability to process massive context windows using Google TurboQuant.
32
+
33
+ * **Developer:** Gaurav / Bhishaj Technologies
34
+ * **Base Model:** Llama-3.2-1B-Instruct
35
+ * **Language(s):** English, Hindi (Indic Legal Terminology)
36
+ * **License:** Llama 3.2 Community License
37
+
38
+ ---
39
 
40
  ## 🛠️ Training & MLOps Architecture
 
 
 
 
 
41
 
42
+ To bypass local hardware constraints (4GB VRAM), the model was trained using a hybrid cloud-edge pipeline:
43
+
44
+ ### 1. Data Engineering
45
+ * **Corpus:** Curated and filtered Indian Legal QA datasets (`Techmaestro369/indian-legal-texts-finetuning`) and multilingual judiciary data (`coild-aikosh/Judiciary_v2`).
46
+ * **Formatting:** Converted raw unstructured legal texts into strict Alpaca/ShareGPT instruction formats for deterministic instruction following.
47
+
48
+ ### 2. Fine-Tuning Setup
49
+ * **Compute:** Kaggle Dual T4 GPUs (32GB VRAM combined).
50
+ * **Optimization:** Utilized **Unsloth** for a 70% VRAM reduction during fine-tuning, accelerating the training process by 2x.
51
+ * **Methodology:** Parameter-Efficient Fine-Tuning (PEFT) using **QLoRA**.
52
+
53
+ ### 3. Guardrails & Alignment
54
+ * Trained with strict negative stop-sequences and deterministic decoding parameters (`Temperature = 0.0`) to cure the base model of MCQ-loop hallucinations.
55
+ * Aligned to a "Senior Advocate, Supreme Court of India" persona for formal, zero-fluff document generation.
56
 
57
+ ---
58
+
59
+ ## ⚡ Edge Deployment & Google TurboQuant (2026)
60
+
61
+ This model is specifically compiled to run on legacy/constrained hardware.
62
+
63
+ By utilizing **Google TurboQuant**, the model compresses the KV-cache to 3-bits during runtime. This allows for **128k context windows** (essential for processing long Indian government gazettes and supreme court rulings) without triggering OOM (Out of Memory) crashes on a 4GB GPU, maintaining a throughput of **~24.5 tokens/sec**.
64
 
65
+ ### 💻 Usage: Running Locally (TurboQuant Enabled)
66
+
67
+ To achieve 6x KV-cache compression on your local machine:
68
+
69
+ ```python
70
  import torch
71
  from transformers import AutoModelForCausalLM, AutoTokenizer
72
  from turboquant import TurboQuantCache
 
81
  prompt = "TASK: Draft a formal legal notice for my client 'M/s Vidhik Electronics' under MSMED Act Sections 15 & 16."
82
  inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
83
 
84
+ # Generate with Compressed Context
85
  with torch.no_grad():
86
  outputs = model.generate(
87
  **inputs,
88
+ past_key_values=tq_cache, # Injecting the TurboQuant cache
89
  max_new_tokens=512,
90
  temperature=0.0
91
  )
92
 
93
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))