Is there a way to enable/disable thinking at the request level?

#39
by septerium - opened

Hi! Thanks for the great work of bringing these amazing quants to the community!! I have really enjoyed Qwen3.5-35B-A3B-Q6_K.gguf for local agentic coding!

I have just read "Qwen3.5 - How to Run Locally Guide" and been testing the use of --chat-template-kwargs '{"enable_thinking":false}' with the llama-server command. Would it be possible to disable thinking only for specific requests, instead of setting this as a fixed behavior?

For the llama.cpp web interface, you can enable it like this (in up-to-date llama.cpp versions):
image

If you want to use it in Python code, it will work if the framework allows setting this non-standard flag

Sign up or log in to comment