Bhishaj
/

Vidhik-Llama-1B-GGU

@@ -9,34 +9,64 @@ tags:
 - turboquant
 - gguf
 - edge-ai
 datasets:
 - Techmaestro369/indian-legal-texts-finetuning
 - bharatgenai/BhashaBench-Legal
 ---
 # ⚖️ Vidhik AI: Sovereign Legal SLM (1B)
-## Model Summary
-Vidhik AI is a highly optimized, domain-specific Small Language Model (SLM) engineered for the Indian Judiciary and MSME sector. Fine-tuned on a 1B parameter base, it specializes in drafting formal legal notices (e.g., MSMED Act delayed payments) and navigating complex Indian officialese.
-**Developer:** Bhishaj Technologies (Gaurav)
-**Base Model:** Llama-3.2-1B-Instruct
-**Quantization:** 4-bit GGUF (Q4_K_M)
 ## 🛠️ Training & MLOps Architecture
-To bypass local hardware constraints, the model was trained using a hybrid cloud-edge pipeline:
-* **Compute:** Kaggle Dual T4 GPUs (32GB VRAM)
-* **Optimization:** Unsloth for 70% VRAM reduction during fine-tuning.
-* **Method:** PEFT/QLoRA instruction fine-tuning on `indian-legal-texts-finetuning`.
-* **Guardrails:** Model is trained with strict negative stop-sequences and deterministic decoding (`Temperature = 0.0`) to prevent MCQ-loop hallucinations.
-## ⚡ Edge Deployment & Google TurboQuant
-This model is specifically compiled to run on legacy/constrained hardware (e.g., NVIDIA GTX 1050 4GB).
-By utilizing **Google TurboQuant**, the model compresses the KV-cache to 3-bits during runtime, allowing for 128k context windows (essential for long Indian government gazettes) without triggering OOM (Out of Memory) crashes, maintaining a throughput of ~24.5 tokens/sec.
-### Python Usage (TurboQuant Enabled)
-\```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 from turboquant import TurboQuantCache
@@ -51,16 +81,13 @@ tq_cache = TurboQuantCache(bits=4, compute_device="cuda")
 prompt = "TASK: Draft a formal legal notice for my client 'M/s Vidhik Electronics' under MSMED Act Sections 15 & 16."
 inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
 with torch.no_grad():
     outputs = model.generate(
         **inputs,
-        past_key_values=tq_cache,
         max_new_tokens=512,
         temperature=0.0
     )
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
-\```
-## 📊 Evaluation
-Evaluated against **BhashaBench-Legal (BBL)** to ensure alignment with Indian judicial service standards and formal legal tonality.

 - turboquant
 - gguf
 - edge-ai
+- slm
+- text-generation
 datasets:
 - Techmaestro369/indian-legal-texts-finetuning
 - bharatgenai/BhashaBench-Legal
+- coild-aikosh/Judiciary_v2
+base_model: meta-llama/Llama-3.2-1B-Instruct
 ---
 # ⚖️ Vidhik AI: Sovereign Legal SLM (1B)
+![Model Size](https://img.shields.io/badge/Parameters-1B-blue)
+![Quantization](https://img.shields.io/badge/Quantization-GGUF_Q4__K__M-orange)
+![Hardware](https://img.shields.io/badge/Optimized_For-4GB_VRAM-green)
+![Framework](https://img.shields.io/badge/Framework-Unsloth_%7C_Transformers-red)
+## 📌 Model Summary
+**Vidhik AI** is a highly optimized, domain-specific Small Language Model (SLM) engineered for the Indian Judiciary and MSME sector. Fine-tuned on a 1B parameter base, it specializes in drafting formal legal notices (e.g., MSMED Act delayed payments), analyzing case law, and navigating complex Indian officialese ("Babu-speak").
+Built with a focus on **Edge Compute**, this model is designed to run locally on highly constrained hardware (like a 4GB GTX 1050) while retaining the ability to process massive context windows using Google TurboQuant.
+* **Developer:** Gaurav / Bhishaj Technologies
+* **Base Model:** Llama-3.2-1B-Instruct
+* **Language(s):** English, Hindi (Indic Legal Terminology)
+* **License:** Llama 3.2 Community License
+---
 ## 🛠️ Training & MLOps Architecture
+To bypass local hardware constraints (4GB VRAM), the model was trained using a hybrid cloud-edge pipeline:
+### 1. Data Engineering
+* **Corpus:** Curated and filtered Indian Legal QA datasets (`Techmaestro369/indian-legal-texts-finetuning`) and multilingual judiciary data (`coild-aikosh/Judiciary_v2`).
+* **Formatting:** Converted raw unstructured legal texts into strict Alpaca/ShareGPT instruction formats for deterministic instruction following.
+### 2. Fine-Tuning Setup
+* **Compute:** Kaggle Dual T4 GPUs (32GB VRAM combined).
+* **Optimization:** Utilized **Unsloth** for a 70% VRAM reduction during fine-tuning, accelerating the training process by 2x.
+* **Methodology:** Parameter-Efficient Fine-Tuning (PEFT) using **QLoRA**.
+### 3. Guardrails & Alignment
+* Trained with strict negative stop-sequences and deterministic decoding parameters (`Temperature = 0.0`) to cure the base model of MCQ-loop hallucinations.
+* Aligned to a "Senior Advocate, Supreme Court of India" persona for formal, zero-fluff document generation.
+---
+## ⚡ Edge Deployment & Google TurboQuant (2026)
+This model is specifically compiled to run on legacy/constrained hardware.
+By utilizing **Google TurboQuant**, the model compresses the KV-cache to 3-bits during runtime. This allows for **128k context windows** (essential for processing long Indian government gazettes and supreme court rulings) without triggering OOM (Out of Memory) crashes on a 4GB GPU, maintaining a throughput of **~24.5 tokens/sec**.
+### 💻 Usage: Running Locally (TurboQuant Enabled)
+To achieve 6x KV-cache compression on your local machine:
+```python
 import torch
 from transformers import AutoModelForCausalLM, AutoTokenizer
 from turboquant import TurboQuantCache
 prompt = "TASK: Draft a formal legal notice for my client 'M/s Vidhik Electronics' under MSMED Act Sections 15 & 16."
 inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
+# Generate with Compressed Context
 with torch.no_grad():
     outputs = model.generate(
         **inputs,
+        past_key_values=tq_cache, # Injecting the TurboQuant cache
         max_new_tokens=512,
         temperature=0.0
     )
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))