Add Key Results summary block
Browse files
README.md
CHANGED
|
@@ -40,7 +40,16 @@ Dystrio Sculpt produces dense compiled variants of existing models that:
|
|
| 40 |
- require no custom kernels
|
| 41 |
- load with standard HuggingFace Transformers
|
| 42 |
|
|
|
|
| 43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
|
| 46 |
## Benchmark Results
|
|
|
|
| 40 |
- require no custom kernels
|
| 41 |
- load with standard HuggingFace Transformers
|
| 42 |
|
| 43 |
+
## Key Results
|
| 44 |
|
| 45 |
+
Compared to **mistralai/Mistral-7B-v0.1** baseline on an **A100 80GB**:
|
| 46 |
+
|
| 47 |
+
- **Weights memory:** **-11% (Conservative)** / **-23% (Balanced)**
|
| 48 |
+
- **RAG latency (TTFT p95):** **-7% / -14%**
|
| 49 |
+
- **Decode throughput:** ~flat
|
| 50 |
+
- **No runtime changes:** no custom kernels, no new ops, standard `transformers` loading
|
| 51 |
+
|
| 52 |
+
> Notes: TTFT includes prefill + first decode step. “Weights memory” is computed from parameter sizes (GiB) and is workload-independent.
|
| 53 |
|
| 54 |
|
| 55 |
## Benchmark Results
|