rdtand commited on
Commit
1e1f27f
·
verified ·
1 Parent(s): d6829ee

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -51,6 +51,8 @@ The speedup comes from eliminating ~5 GB of BF16 weight loads per token for the
51
 
52
  95-99% quality recovery across knowledge and reasoning benchmarks. Quantizing the DeltaNet linear attention layers to FP4 is near-lossless.
53
 
 
 
54
  ## Quantization details
55
 
56
  - **Method**: llm-compressor `oneshot` with calibrated NVFP4 (W4A4)
 
51
 
52
  95-99% quality recovery across knowledge and reasoning benchmarks. Quantizing the DeltaNet linear attention layers to FP4 is near-lossless.
53
 
54
+ **Note:** GSM8k results are excluded as the model's thinking/reasoning output format interferes with lm-eval-harness answer extraction, producing unreliable scores. Subjective quality in interactive use (Open WebUI, chat API) is excellent with reasoning intact.
55
+
56
  ## Quantization details
57
 
58
  - **Method**: llm-compressor `oneshot` with calibrated NVFP4 (W4A4)