AEON-7 commited on
Commit
77e1418
·
verified ·
1 Parent(s): c2ebb91

Doc: validated on unified aeon-vllm-ultimate:latest base (2026-06-09 fleet check)

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -31,6 +31,8 @@ tags:
31
 
32
  # Qwen3.6-27B-AEON-Ultimate-Uncensored-NVFP4
33
 
 
 
34
  > **Deployment, operations & benchmarks → [github.com/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash](https://github.com/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash)**
35
  >
36
  > The GitHub repo is the source of truth for the production deployment guide, hardware-tuned docker-compose configs, full configuration reference, measured throughput benchmarks (32 tok/s median / 56 tok/s peak / 350 ms TTFT on DGX Spark), and `AGENTS.md` — an operator's manual that pre-empts common stale-documentation traps for AI coding agents working on this stack.
 
31
 
32
  # Qwen3.6-27B-AEON-Ultimate-Uncensored-NVFP4
33
 
34
+ > ✅ **Validated 2026-06-09 on the unified [AEON vLLM Ultimate image](https://github.com/AEON-7/vllm-ultimate-dgx-spark) `ghcr.io/aeon-7/aeon-vllm-ultimate:latest`** (vLLM 0.22.1+pr44389) — loads + serves cleanly (compressed-tensors) with the z-lab DFlash drafter @ n=12 — measured **33.2 tok/s single / 154.4 tok/s conc×16**, 28% DFlash acceptance. Recommended container base.
35
+
36
  > **Deployment, operations & benchmarks → [github.com/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash](https://github.com/AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored-DFlash)**
37
  >
38
  > The GitHub repo is the source of truth for the production deployment guide, hardware-tuned docker-compose configs, full configuration reference, measured throughput benchmarks (32 tok/s median / 56 tok/s peak / 350 ms TTFT on DGX Spark), and `AGENTS.md` — an operator's manual that pre-empts common stale-documentation traps for AI coding agents working on this stack.