Text Generation
Transformers
GGUF
step3p5
custom_code
imatrix
conversational

Is it possible to release a version with low bit quantization?

#11
by lan0004 - opened

It works really well with OpenClaw, especially for those who want a local low-bit quantization version.

StepFun org

Are you asking for 2-bit or even 1.58-bit version?

For example, its quantized size, capable of running on a machine with an AI MAX 395, currently exceeds 128GB in size due to the model size of INT4 plus context space and system overhead.

StepFun org

@lan0004 I am not sure I understand your question. We did successfully run this model with int4 on AI Max 395.

StepFun org
edited Feb 9

For example, its quantized size, capable of running on a machine with an AI MAX 395, currently exceeds 128GB in size due to the model size of INT4 plus context space and system overhead.

https://github.com/stepfun-ai/Step-3.5-Flash/blob/main/llama.cpp/docs/step3.5-flash.md Sorry for the late reply. On the AI MAX 395, I can run the int4 model with up to around a 64K context.

Sign up or log in to comment