Jinja Template

#1
by SerialKicked - opened

Congrats on the new release! I didn't get the time to fully test it yet.

I've noticed you're using the default Jinja template for your models. I'd suggest to use my edit as it fixes a couple issues, and allows for system messages in the middle of the prompt, instead of crashing the frontend. I know most of your target audience still use text completion mode, but still :p

I use chat completion on Open-WebUI and haven't run into any issues.

What does this fix exactly?

System messages in the middle of a prompt like:

[system] main prompt
[user] Hi!
[system] some external info sent by frontend / plugin / whatever to steer discussion <- boom
[bot] Hello!

would trigger an exception (line 85) as the official templates enforce user/bot alternance and only wants one single system message in first position.

There's also a minor issue on some backends (LMStudio i think? Not llama.cpp) with the tool logic here:

{%- if ns.multi_step_tool %}
{{- raise_exception('No user query found in messages.') }}
{%- endif %}

At line 78 of old template. It would always trigger when there's no tool. I'm not sure why it happens (probably half-assed Jinja or OpenAI API implementation in the backend) but removing this block has literally no adverse effect as it's purely defensive "coding".

would trigger an exception (line 85) as the official templates enforce user/bot alternance and only wants one single system message in first position.

Haven't seen this on either vllm or ik_llamacpp. How would I trigger this?

With a chatlog containing a list of messages like:

{ role = system; content = "You're an helpful assistant, blah blah." },  
{ role = assistant; content = "how can I help you today?" },  
{ role = system; content = "some text" },
{ role = user; content = "what's 2+2?"  }  

(if you're using Textgen WebUI, i'm pretty sure it has a "/sys" command, or something like that, to post a system message mid discussion. Silly Tavern has one as well. I don't remember their slash commands by heart, tho :p )

The template triggers an exception because it expects only one single system message at the very top.

This one will happen on literally any backend because of this block:

    {%- if message.role == "system" %}
        {%- if not loop.first %}
            {{- raise_exception('System message must be at the beginning.') }}
        {%- endif %}

So, sure, not all front-ends are going to insert system messages in the middle of a discussions, but for the roleplay-centric ones (silly tavern for instance, especially with plugins), it's often used as a way to steer discussion (as models are told to consider system messages without referencing them directly, kind of OOC type of message).

Edit: Of course that's for chat completion backends. If you're using the backend in text completion, the jinja template has no effect / is not used at all.

google results on people encountering the issue in the wild
(and i get it's technically not a bug, but it's kinda limiting for a rp model)

@gecfdo @ToastyPigeon @Retreatcost .. literally anyone else.

Anyone seeing crashing with our provided jinja template?

I will replace it if there are people with this issue.

Congrats on the new release! I didn't get the time to fully test it yet.

I've noticed you're using the default Jinja template for your models. I'd suggest to use my edit as it fixes a couple issues, and allows for system messages in the middle of the prompt, instead of crashing the frontend. I know most of your target audience still use text completion mode, but still :p

This template is almost identical to the official, but has a bug where the system message gets rendered twice. Lines 62-65 render it, then the main loop at lines 80-81 renders it again. This could cause issues.

would trigger an exception (line 85) as the official templates enforce user/bot alternance and only wants one single system message in first position.

Haven't seen this on either vllm or ik_llamacpp. How would I trigger this?

Th official template has a bug-like quirk where it re-validates but doesn't skip the system message in the main loop (should be harmless since it just doesn't re-render it).

If you want to replace the template, you can test either:

https://huggingface.co/DavidAU/Qwen3.5-40B-Claude-4.5-Opus-High-Reasoning-Thinking/blob/main/chat_template.jinja

Or

https://huggingface.co/win10/Huihui-Qwen3.5-27B-abliterated-FP8/blob/main/chat_template_qwen35_mm_interleaved_thinking.jinja

Davidau version has the cleanest code while being better organized overall which properly fixes the double-system-message issue with a system_rendered flag and it uses XML-structured tool definitions instead of raw JSON dumps, which is more readable for the model.

Interleaved Thinking version is the most sophisticated as it properly handles interleaved thinking (hides reasoning from historical turns before the last user query) and has better tool argument handling with dedicated macros with the most robust error handling.

Oh damn, you're right! I accidentally doubled the system prompt. I'm not too sure about the XML part for tool calls. Feels like it'd go against the model's training. I'll import his flag system, though. That's a great idea to bypass the double prompt issue.

Edit: Fixed mine, thanks!

Ready.Art org

Thanks, I will look into this further.

Edit: Fixed mine, thanks!

There's still a bug. Line 72 overwrites the namespace:

Line 46:

{%- set ns = namespace(system_rendered=false) %}
...sets ns.system_rendered = true after rendering system...

Line 72:

{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}

Line 72 creates a brand new ns object, wiping out system_rendered so when line 84 checks ns.system_rendered, it's undefined which means the system message will get rendered again, the double-render bug is back.

That'll teach me to work after a sleepless night :'(

replaced initial set by {%- set system_rendered = namespace(value=false) %}

and values sets/checks by {%- set system_rendered.value = true %} and {%- if message.role == "system" and system_rendered.value and loop.first %}

Thanks again, mate :)

That'll teach me to work after a sleepless night :'(

replaced initial set by {%- set system_rendered = namespace(value=false) %}

and values sets/checks by {%- set system_rendered.value = true %} and {%- if message.role == "system" and system_rendered.value and loop.first %}

Thanks again, mate :)

There is still inconsistent indentation (tabs vs spaces on lines 64 and 69).

Tabs vs space doesn't matter in Jinja afaik; but i'll clean it up (also i blame the text editor). -> done

Tabs vs space doesn't matter in Jinja afaik; but i'll clean it up (also i blame the text editor).

Do that and you will have a great chat_template.jinja replacement from the official version sticking close to the original and what the model was trained with and expects.

@gecfdo @ToastyPigeon @Retreatcost .. literally anyone else.

Anyone seeing crashing with our provided jinja template?

I will replace it if there are people with this issue.

No, everything worked fine.

Also haven't seen system role being used multiple times, I don't think regular front-ends even allow that.

I don't think regular front-ends even allow that.

It's "only" commonly used for WorldInfo/Lorebooks, and VectorDB/RAG, there's even a /sys command in SillyTavern for arbitrary manual insertions.

Here, because apparently frontends are hard to use:

brave_MykCu3ZRzb

Same issue on every front end on earth (except mine, because I auto override templates at this point :D)

Ready.Art org

I don't think regular front-ends even allow that.

It's "only" commonly used for WorldInfo/Lorebooks, and VectorDB/RAG, there's even a /sys command in SillyTavern for arbitrary manual insertions.

Here:
Screenshot 2026-03-26 234919

Then, you do you.

Is this specific to a backend like TabbyAPI or something?

I can't replicate this in VLLM or Llamacpp with Chat Completion on OpenWebUI when using RAG.

Screenshot_20260326_155247

Ready.Art org

@SerialKicked poke

I want to fix this, I just need a way to replicate it so I can verify it happens.

If I use sillytavern, what do I need to do to trigger it?

Ready.Art org

I threw in a random assistant chat on ST and used the assistant message and it works fine. I'm using chat completion with tools.

Screenshot_20260327_143956

I threw in a random assistant chat on ST and used the assistant message and it works fine. I'm using chat completion with tools.

Screenshot_20260327_143956

You can just use this: https://huggingface.co/SerialKicked/Lethe-AI-Repo/blob/main/Fixed%20JINJA%20Templates/ChatML-Qwen3.5.jinja

And rename it chat_template.jinja and replace it, it' basically the same as the original just without the bug-like quirk where it re-validates but doesn't skip the system message in the main loop.

Ready.Art org

Hi,

My question is how I can replicate the bad behavior you are talking about (I.e. crashing) on SillyTavern specifically.

What steps do I need to take? I started an assistant chat, then did /sys to inject a sys prompt message. Then I chatted again and it still worked.

Hi,

My question is how I can replicate the bad behavior you are talking about (I.e. crashing) on SillyTavern specifically.

What steps do I need to take? I started an assistant chat, then did /sys to inject a sys prompt message. Then I chatted again and it still worked.

You're gonna need to ask @SerialKicked , as I never experienced this issue myself.

Ready.Art org

:S

Yeah I haven't either... that's why I'm confused on this entire thing lol

I've run into this issue with a couple of models (I think mostly Qwen-3.5 based) but I believe I was able to work around it using prompt squashing or something.

Also, I I think there may be miscommunication happening here- someone used the term "crash," which is not how I would describe what happened. It's an error, and generation stops. (I believe llama.cpp treats it as a 500 error, but I don't recall llama.cpp crashing, and ST definitely didn't crash.

Ready.Art org

But I'm still not experiencing that lol

Sorry, I was AFK for a couple days.

Now I'm confused, because I literally took your own GGUF (with I assume has the same template as this uncompressed model, right?), I updated my old SillyTavern to the latest version, all settings close to default. I used llama.cpp as a backend, loaded in chat completion mode, loaded an old discussion, and did in the dialog box:

/sys some message [enter]
--> system message shows up

hello [enter]
--> "RED TEXT OF DOOM" pops up, and the backend refuses to go any further until the system message is deleted.

So I'm not too sure what's different.

I've run into this issue with a couple of models (I think mostly Qwen-3.5 based) but I believe I was able to work around it using prompt squashing or something.

Yep, right now it's only Qwen 3.5 (many popular Qwen 3.5 finetunes already use a custom template, like the Claude distilled ones) and the new Mistral No-So-Small that have those weird limitations to their "official" jinja. That was discussed in a Mistral thread recently. That's kinda why I'm trying to get creative model devs to actually use a non limiting template before they get too widespread.

You can indeed go around it with system message squashing (i think it's in the left panel with the inference settings in ST), assuming it means the same thing it normally does. In that case it'll move all system messages into the system prompt. Sure, this walks around the issue, but depending on the nature of the system messages (like date information, or OOC directions), it can instead further confuse the model. You can also just load an external template file in llama-server, but I don't expect most users to understand that stuff.

Also, I I think there may be miscommunication happening here- someone used the term "crash," which is not how I would describe what happened. It's an error, and generation stops. (I believe llama.cpp treats it as a 500 error, but I don't recall llama.cpp crashing, and ST definitely didn't crash.

Oh right, I mean the template crashes, not the whole silly tavern, lol.

If you want to be technical, llama.cpp (or any backend using Jinja templates) is just executing the template to format the prompt. And if there's a system message in the middle of the whole prompt, the default template refuses to go any further and "crashes". That's what I meant :)


At the end of the day, my template respects the official format to a T (contrary to the other ones floating around) except that it allows a more permissive list of messages. It's not going to dumb down your model, nor change its output in any way shape or form, it'll just make it easier to use for more varied use cases. I'm just trying to help, as it doesn't impact me either way if you use my template (or one of the others) or not.

@FrenzyBiscuit poke (my turn :D)

Ready.Art org

Let's try to narrow this down. Which GGUF are you specifically testing this against?

Why don't you just read what the template says? It literally says it raises an error if system message is in any other spot than first here

    {%- if message.role == "system" %}
        {%- if not loop.first %}
            {{- raise_exception('System message must be at the beginning.') }}
        {%- endif %}

from https://huggingface.co/ReadyArt/Omega-Evolution-27B-v2.0/blob/main/chat_template.jinja

I have had to patch this for every qwen3.5 model which uses original templates.

Why don't you just read what the template says? It literally says it raises an error if system message is in any other spot than first here

    {%- if message.role == "system" %}
        {%- if not loop.first %}
            {{- raise_exception('System message must be at the beginning.') }}
        {%- endif %}

from https://huggingface.co/ReadyArt/Omega-Evolution-27B-v2.0/blob/main/chat_template.jinja

I have had to patch this for every qwen3.5 model which uses original templates.

Does the Qwen3.6 model template has the same issue?

Why don't you just read what the template says? It literally says it raises an error if system message is in any other spot than first here

    {%- if message.role == "system" %}
        {%- if not loop.first %}
            {{- raise_exception('System message must be at the beginning.') }}
        {%- endif %}

from https://huggingface.co/ReadyArt/Omega-Evolution-27B-v2.0/blob/main/chat_template.jinja

I have had to patch this for every qwen3.5 model which uses original templates.

I gave up arguing in this thread, and due to other interactions elsewhere where they've made extra clear their understanding of LLM was to run a 3rd party script they don't understand, I'm done. You can lead a horse to water but you can't force it to drink. I'm not being paid to make them understand 101 Jinja.

In a way, i makes things easier, lots of models, lots of fine-tuners, and it's the most basic of competency tests. After running their models through my automated test gauntlet, it's on par with what I'd expect given the fact they don't even conceptually understand the problem (let alone the practicalities).

Why don't you just read what the template says? It literally says it raises an error if system message is in any other spot than first here

    {%- if message.role == "system" %}
        {%- if not loop.first %}
            {{- raise_exception('System message must be at the beginning.') }}
        {%- endif %}

from https://huggingface.co/ReadyArt/Omega-Evolution-27B-v2.0/blob/main/chat_template.jinja

I have had to patch this for every qwen3.5 model which uses original templates.

Does the Qwen3.6 model template has the same issue?

It does, look at line 83-86: https://huggingface.co/Qwen/Qwen3.6-35B-A3B/blob/main/chat_template.jinja

But it's easy AF to patch.

You just swap the line that raises error

{{- raise_exception('System message must be at the beginning.') }}
to 
{{- '<|im_start|>system\n' + content + '<|im_end|>' + '\n' }}

And it will insert the system message content as expected.

If you're at it fixing the chat template I can point you to 2 other things. Not as breaking, but still.

Line 78-80: If messages has no user turn (system-only), ns.multi_step_tool stays true β†’ raises 'No user query found in messages.'
Line 88: Empty user content after |trim renders a bare <|im_start|>user\n<|im_end|>\n β†’ tokenizer may choke on that

{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
to
{{- '<|im_start|>' + message.role + '\n' + (content if content else ' ') + '<|im_end|>' + '\n' }}
And
{{- raise_exception('No user query found in messages.') }}
to
{%- set ns.last_query_index = messages|length - 1 %}

Fixed sending empty messages.

Ask claude or smth if you unsure :)

Love your work btw!

Edit: Oh, the empty message thing is a problem when using some extensions in sillytavern. Like qvink memory, it sends sysprompt without a user message for each summarization.

Your

And if you want to experience the problem, just send a system message, or have a lorebook entry as system, or authors note as system @depth 0 or something, it will error.
Anything that doesn't error with the broken template is literally not following the template.

Ready.Art org

Why don't you just read what the template says? It literally says it raises an error if system message is in any other spot than first here

    {%- if message.role == "system" %}
        {%- if not loop.first %}
            {{- raise_exception('System message must be at the beginning.') }}
        {%- endif %}

from https://huggingface.co/ReadyArt/Omega-Evolution-27B-v2.0/blob/main/chat_template.jinja

I have had to patch this for every qwen3.5 model which uses original templates.

I gave up arguing in this thread, and due to other interactions elsewhere where they've made extra clear their understanding of LLM was to run a 3rd party script they don't understand, I'm done. You can lead a horse to water but you can't force it to drink. I'm not being paid to make them understand 101 Jinja.

In a way, i makes things easier, lots of models, lots of fine-tuners, and it's the most basic of competency tests. After running their models through my automated test gauntlet, it's on par with what I'd expect given the fact they don't even conceptually understand the problem (let alone the practicalities).

Nobody is arguing with you on here. I asked you which GGUF quant you were using so I could specifically test with that quant. At which point I would have fixed the jinja template on replication. You never replied back. Now you're randomly replying back 22 days later just to shit talk us, and that's really not cool!

I'm not sure why you're so upset with us (I certainly don't recall anything) so...

I use Chat Completon on both OpenWebUI and SillyTavern and have never run into this issue. But I don't use SillyTavern extensions and I don't use system prompts for lorebooks.

But when I did try to recreate this in SillyTavern using a GGUF.. it worked.

Anyway, I think I'm going to leave this thread open. So people who want to modify the template can see it and do so. I'm not planning on changing it on our end at this point.

Another thing that (very much) stands out to me is this model has almost 10k combined downloads (quants) and it's a total of like three people bringing up this issue.

FrenzyBiscuit locked this discussion
FrenzyBiscuit pinned discussion

Sign up or log in to comment