TinyLlama-1.1B-Chat-v1.0-zse-int4

Pre-converted ZSE model for ultra-fast inference.

Source Model

Usage

pip install zllm-zse

# Download and serve
zse pull tinyllama-1.1b-chat-v1.0
zse serve tinyllama-1.1b-chat-v1.0

# Or direct
zse serve TinyLlama-1.1B-Chat-v1.0-zse-int4.zse

Benefits

  • 5x faster cold start compared to HuggingFace loading
  • 10-14% less VRAM with ZSE custom INT4 kernels
  • Single file — tokenizer and config embedded
  • No internet required after download

Benchmarks

See ZSE Documentation for full benchmarks.


Converted with ZSE v1.4.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zse-zllm/TinyLlama-1.1B-Chat-v1.0-zse-int4

Finetuned
(534)
this model