TinyLlama-1.1B-Chat-v1.0-zse-int4
Pre-converted ZSE model for ultra-fast inference.
Source Model
- Original: TinyLlama/TinyLlama-1.1B-Chat-v1.0
- Quantization: INT4
- File Size: 0.71 GB
- Format: ZSE binary (.zse)
Usage
pip install zllm-zse
# Download and serve
zse pull tinyllama-1.1b-chat-v1.0
zse serve tinyllama-1.1b-chat-v1.0
# Or direct
zse serve TinyLlama-1.1B-Chat-v1.0-zse-int4.zse
Benefits
- 5x faster cold start compared to HuggingFace loading
- 10-14% less VRAM with ZSE custom INT4 kernels
- Single file — tokenizer and config embedded
- No internet required after download
Benchmarks
See ZSE Documentation for full benchmarks.
Converted with ZSE v1.4.0
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for zse-zllm/TinyLlama-1.1B-Chat-v1.0-zse-int4
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0