dystrio
/

Mistral-7B-v0.1-sculpt-conservative

Text Generation

runtime-agnostic

no-custom-kernels

Eval Results (legacy)

text-generation-inference

Model card Files Files and versions

dystrio commited on Mar 5

Commit

fa7c026

·

verified ·

1 Parent(s): 7f2e022

Add Key Results summary block

Files changed (1) hide show

README.md +9 -0

README.md CHANGED Viewed

@@ -40,7 +40,16 @@ Dystrio Sculpt produces dense compiled variants of existing models that:
 - require no custom kernels
 - load with standard HuggingFace Transformers
 ## Benchmark Results

 - require no custom kernels
 - load with standard HuggingFace Transformers
+## Key Results
+Compared to **mistralai/Mistral-7B-v0.1** baseline on an **A100 80GB**:
+- **Weights memory:** **-11% (Conservative)** / **-23% (Balanced)**
+- **RAG latency (TTFT p95):** **-7% / -14%**
+- **Decode throughput:** ~flat
+- **No runtime changes:** no custom kernels, no new ops, standard `transformers` loading
+> Notes: TTFT includes prefill + first decode step. “Weights memory” is computed from parameter sizes (GiB) and is workload-independent.
 ## Benchmark Results