Is there a way to enable/disable thinking at the request level?

#39

by septerium - opened Mar 12

Mar 12

Hi! Thanks for the great work of bringing these amazing quants to the community!! I have really enjoyed Qwen3.5-35B-A3B-Q6_K.gguf for local agentic coding!

I have just read "Qwen3.5 - How to Run Locally Guide" and been testing the use of --chat-template-kwargs '{"enable_thinking":false}' with the llama-server command. Would it be possible to disable thinking only for specific requests, instead of setting this as a fixed behavior?

sjoelund

24 days ago

For the llama.cpp web interface, you can enable it like this (in up-to-date llama.cpp versions):

If you want to use it in Python code, it will work if the framework allows setting this non-standard flag

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment