Reasoning

#2
by bewilderbeast - opened

Has anyone managed to use reasoning with this model? I have not been able to get reasoning to work with any of my frontends, the output of the model does not contain any reasoning trace. I have included <|think|> at the beginning in my system prompts.

Here is the command I use to run the model:
docker run --rm --init --network=host --gpus all --ipc=host -v /var/llamamodels:/models --name vllm-gemma4 vllm-custom:latest --model /models/cyankiwi/gemma-4-31B-it-AWQ-8bit --port 8001 --served-model-name gemma-4-awq-vllm --reasoning-parser gemma4 --enable-auto-tool-choice --tool-call-parser gemma4 --tensor-parallel-size 1 --max-model-len 160000 --gpu-memory-utilization 0.75

vllm-custom is my docker image with transformers 5.60.dev. I have built it both on vllm-openai:nightly (which identifies as v0.18.2rc) and vllm-openai:v0.19.0.

Has anyone gotten reasoning working?

this is the instruct version without reasoning, you can identify it by the "-it-" part of the model name @bewilderbeast

edit: sorry for my misinformation I didn't read properly and was used to this nomenclature from the qwen think/instruct models

cyankiwi org

Please pass the tag <|think|> to system content i.e., {"role": "system", "content": "<|think|>"}, {"role": "user", "content": "Hey, how are you?"}, or pass chat_template_kwargs={"enable_thinking": True} to chat template.

This is the same with the full-precision model.

Thank you for responding. I had already set the <|think|> tag at the beginning of the system message, I additionally hat to set the chat_template_kargs, that finally helped. Thank you!

bewilderbeast changed discussion status to closed

Sign up or log in to comment