acnagle
/

Terminator-Qwen3-8B

@@ -159,7 +159,7 @@ Set these environment variables before running `start_server.sh` or `serve.py`:
 ## Standalone Inference (No Server)
-> **Recommendation:** For the best performance, use the vLLM server described above. vLLM uses KV caching, CUDA graphs, and optimized kernels, making it **significantly faster** than HuggingFace-native inference. The script below is provided for quick testing and demos where spinning up a server is inconvenient.
 For quick testing without starting a vLLM server, use the HuggingFace-native inference script:

 ## Standalone Inference (No Server)
+**Recommendation:** For the best performance, use the vLLM server described above. vLLM uses KV caching, CUDA graphs, and optimized kernels, making it **significantly faster** than HuggingFace-native inference. The script below is provided for quick testing and demos where spinning up a server is inconvenient.
 For quick testing without starting a vLLM server, use the HuggingFace-native inference script: