acnagle commited on
Commit
a8ec481
·
verified ·
1 Parent(s): f1e3d85

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -159,7 +159,7 @@ Set these environment variables before running `start_server.sh` or `serve.py`:
159
 
160
  ## Standalone Inference (No Server)
161
 
162
- > **Recommendation:** For the best performance, use the vLLM server described above. vLLM uses KV caching, CUDA graphs, and optimized kernels, making it **significantly faster** than HuggingFace-native inference. The script below is provided for quick testing and demos where spinning up a server is inconvenient.
163
 
164
  For quick testing without starting a vLLM server, use the HuggingFace-native inference script:
165
 
 
159
 
160
  ## Standalone Inference (No Server)
161
 
162
+ **Recommendation:** For the best performance, use the vLLM server described above. vLLM uses KV caching, CUDA graphs, and optimized kernels, making it **significantly faster** than HuggingFace-native inference. The script below is provided for quick testing and demos where spinning up a server is inconvenient.
163
 
164
  For quick testing without starting a vLLM server, use the HuggingFace-native inference script:
165