Is it possible to release a version with low bit quantization?

#11

by lan0004 - opened Feb 6

Discussion

lan0004

Feb 6

It works really well with OpenClaw, especially for those who want a local low-bit quantization version.

bobzhuyb

StepFun org Feb 6

Are you asking for 2-bit or even 1.58-bit version?

lan0004

Feb 6

For example, its quantized size, capable of running on a machine with an AI MAX 395, currently exceeds 128GB in size due to the model size of INT4 plus context space and system overhead.

bobzhuyb

StepFun org Feb 9

@lan0004 I am not sure I understand your question. We did successfully run this model with int4 on AI Max 395.

apohelios

StepFun org Feb 9

•

edited Feb 9

For example, its quantized size, capable of running on a machine with an AI MAX 395, currently exceeds 128GB in size due to the model size of INT4 plus context space and system overhead.

https://github.com/stepfun-ai/Step-3.5-Flash/blob/main/llama.cpp/docs/step3.5-flash.md Sorry for the late reply. On the AI MAX 395, I can run the int4 model with up to around a 64K context.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment