Is there a way to enable/disable thinking at the request level?
#39
by septerium - opened
Hi! Thanks for the great work of bringing these amazing quants to the community!! I have really enjoyed Qwen3.5-35B-A3B-Q6_K.gguf for local agentic coding!
I have just read "Qwen3.5 - How to Run Locally Guide" and been testing the use of --chat-template-kwargs '{"enable_thinking":false}' with the llama-server command. Would it be possible to disable thinking only for specific requests, instead of setting this as a fixed behavior?
