Send me your support to help me feed the data beast! also taking comissions for universe specific models

Critical: Chat Template & Setup

This model uses the standard Gemma 4 chat template. The structure is complex and requires specific configuration for correct performance. Please read the following sections carefully.

Human-Readable Format:

<bos><|turn>system
<|think|>{system prompt}<turn|>
<|turn>user
{user message}<turn|>
<|turn>model
<|channel>thought
*{reasoning}*<channel|>*{response}*<turn|>

⚠️ Important: Use Chat Completion, Not Text Completion

Strongly recommended: Always use the /v1/chat/completions endpoint. Correct formatting is handled automatically by the server.

Text completion (/v1/completions) is currently broken for impersonations without a custom patch. Sending the context as a user prompt and asking the model to generate the response can produce a malformed turn structure.

If you are using SillyTavern, make sure your connection is set to Chat Completion mode, not Text Completion / Instruct mode.

⚠️ Important: Asterisk Delimiters & Prefill Requirement

This model was trained with asterisks (*) as content delimiters for both reasoning and the final response. This has a critical inference implication:

The opening channel token <|channel>thought is a special token that gets masked during training. As a result, the model cannot generate it on its own — it must be injected as a prefill at generation time.

You must configure your inference setup to prefill the model's turn with:

For thinking mode: <|channel>thought\n*
For no-thinking mode: * (only prefill an asterisk).

Without this prefill, the model will either fail to reason or produce malformed output. Most frontends support this via a "Start Reply With" or "Output Prefix" setting. See the SillyTavern settings files for a working configuration.

API Configuration (SillyTavern, etc.)

When using an OpenAI-compatible API, you must add custom parameters to control thinking and ensure correct token handling. In frontends like SillyTavern, add the following JSON to the "Custom Parameters" (or extra_body) field in your connection settings:

"chat_template_kwargs": {"enable_thinking": true}
"skip_special_tokens": false

"enable_thinking": true: Activates the model's chain-of-thought reasoning. Change to false to disable it for faster, more direct responses.
"skip_special_tokens": false: This is critical. It prevents the API from stripping away the special tokens (like <|turn|> and <|channel|>) that are required for the model's chat template to work correctly.

vLLM Serving

Example command:

vllm serve Darkhn/Gemma-4-31B-Animus-V14.0 \
  --tensor-parallel-size 4 \
  --max-model-len auto  \
  --cudagraph-capture-sizes 1 \
  --port 8000 \
  --gpu-memory-utilization 0.95 \
  --enable-prefix-caching \
  --enable-auto-tool-choice \
  --tool-call-parser gemma4 \
  --limit-mm-per-prompt '{"image":0,"audio":0}' \
  --async-scheduling \
  --max-num-seqs 1 \
  --enable-chunked-prefill \
  --disable-custom-all-reduce \
  --api-key YOURKEY \
  --trust-remote-code

Note: Do not use --reasoning-parser gemma4 unless your frontend supports the reasoning_content field in the API response. Most frontends (including SillyTavern) handle reasoning stripping themselves using the prefix/suffix configured in their settings.

General Notes

The model retains the vision adapter from the base Gemma 4 31B and supports image inputs.
Both thinking and no-thinking modes were included in training data — use enable_thinking to select at runtime.
The <|think|> token in the system turn is what activates chain-of-thought reasoning at the architecture level.

Quantized Models

The quantized model files are available for download. Click the button below to view the files.

Download GGUF Files →

Character Card & Lore Book

For the best roleplaying experience, it is highly recommended to use the provided character card and lore book. These files help guide the model's persona and provide rich, in-universe context.

Download Files →

Sampler Presets

For a seamless setup in SillyTavern, you can download pre-configured sampler presets. These are tuned to provide an optimal balance between creativity and narrative coherence for this model.

Simply download the .json file below and import it into SillyTavern's sampler presets menu.

Download SillyTavern Preset (Thinking) → Download SillyTavern Preset (No Thinking) →

For those that dont use silly tavern, Samplers settings are:

Temperature: 1.0

Min P: 0.02

Roleplay Format Guide

For the best results, use this structured format. This helps the AI clearly distinguish between actions, inner thoughts, and dialogue.

Actions / Descriptions: *He walked across the room and stared out the window.*
Inner Thoughts: *-I wonder what she's thinking.-*
Dialogue: Alex (Curious): "What do you see out there?"

Standard novel-style formatting is also understood, but this structured format is preferred for clarity.

Roleplay Example

Click the button below to view a full, unedited chatlog demonstrating the model's narrative style and character portrayal.

View Chatlog Example →

Model Description

This is Version 14.0, in the Animus series. V14.0 is built on google/gemma-4-31B-it, offering a massive leap in parameter count and underlying logic compared to previous versions.

V14.0's strength comes from a novel dataset designed to teach the model the why behind the lore, not just the what. The training data has been heavily expanded for this version:

Base Samples Doubled: The foundation of in-character study sessions and uncensored roleplays has been doubled in size (14,000) to deepen contextual understanding.
1,000 Instruction Q&A Samples: Additional Wings of Fire-based instruction formatting.
1,000 NSFW/BAD Ending Samples: Non-Wings of Fire scenarios added to diversify narrative flexibility and handle darker, complex outcomes.

The result is a model with exceptionally strong prose and a deep grasp of in-universe lore, making for a highly immersive and accurate roleplaying experience.

Note for roleplay, it follows system prompt and first message, meaning if the first assistant message is short, the following messages will be short.

Training Details

V14.0 Training Process

V14.0 marks a shift from model merging to a focused, direct fine-tuning approach. This allows for greater control over the final model's characteristics.

Base Model: google/gemma-4-31B-it
Hardware: 2xH100
Epochs: 1

Training Dataset

The V14.0 dataset has been significantly expanded from previous versions:

Doubled Base Dataset (14,000 examples): The original foundation of In-Character Q&A and Uncensored Roleplay examples was doubled to reinforce the lore foundation and enhance roleplay quality.
Instruction Q&A (1,000 examples): Additional Wings of Fire-based instruction Q&A sets.
NSFW / Bad Endings (1,000 examples): Non-Wings of Fire scenarios specifically targeting mature themes and bad endings to widen the model's range of dramatic narrative capabilities.

All datasets underwent a rigorous cleaning process to remove formatting artifacts, resulting in a cleaner and more natural narrative style.

Intended Use & Limitations

Intended Use: The primary purpose of this model is for creative and roleplaying within the Wings of Fire universe. However, user feedback indicates it is also highly effective for general-purpose roleplaying.
Limitations & Quirks:
- Performance on tasks outside of its training domain (general knowledge, coding, etc.) is not guaranteed and will likely be poor.
- Versatility: While it appears to be only a Wings of Fire tuned model, users have reported it is very capable of performing normal roleplay with other settings and characters.
- The model may "hallucinate" or generate plausible but non-canonical information, especially when pushed outside the established "what-if" scenarios.
- Content: The training data includes mature and darker themes from the Wings of Fire series, such as conflict, character death, and moral ambiguity. The model is capable of generating content reflecting these themes. As always, it is up to the user what they do with it.
- Formatting: Training data was cleaned to remove narrative artifacts like **scene transitions**. The model should now produce cleaner prose.
- Safety: This model has not undergone additional safety alignment beyond what was included in its base model. Standard responsible AI practices should be followed.

Acknowledgements

Credit to Google for the powerful gemma-4-31B-it model.
Credit to Google for the Gemini Pro model, used in dataset generation.
Credit to Anthropic for sonnet 4.5, used in dataset generation.
Credit to Hangzhou DeepSeek Artificial Intelligence for the deepseek model, used in dataset generation.
Credit to Moonshot AI for the Kimi K2 model, used in dataset generation.