Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -159,7 +159,7 @@ Set these environment variables before running `start_server.sh` or `serve.py`:
|
|
| 159 |
|
| 160 |
## Standalone Inference (No Server)
|
| 161 |
|
| 162 |
-
|
| 163 |
|
| 164 |
For quick testing without starting a vLLM server, use the HuggingFace-native inference script:
|
| 165 |
|
|
|
|
| 159 |
|
| 160 |
## Standalone Inference (No Server)
|
| 161 |
|
| 162 |
+
**Recommendation:** For the best performance, use the vLLM server described above. vLLM uses KV caching, CUDA graphs, and optimized kernels, making it **significantly faster** than HuggingFace-native inference. The script below is provided for quick testing and demos where spinning up a server is inconvenient.
|
| 163 |
|
| 164 |
For quick testing without starting a vLLM server, use the HuggingFace-native inference script:
|
| 165 |
|