--- license: apache-2.0 tags: - finetune - roleplay - chat - wings-of-fire - nsfw - not-for-all-audiences base_model: - google/gemma-4-31B-it ---
Send me your support to help me feed the data beast! also taking comissions for universe specific models
Support on Ko-fiThis model uses the standard Gemma 4 chat template. The structure is complex and requires specific configuration for correct performance. Please read the following sections carefully.
Human-Readable Format:
<bos><|turn>system
<|think|>{system prompt}<turn|>
<|turn>user
{user message}<turn|>
<|turn>model
<|channel>thought
*{reasoning}*<channel|>*{response}*<turn|>
Strongly recommended: Always use the /v1/chat/completions endpoint. Correct formatting is handled automatically by the server.
Text completion (/v1/completions) is currently broken for impersonations without a custom patch. Sending the context as a user prompt and asking the model to generate the response can produce a malformed turn structure.
If you are using SillyTavern, make sure your connection is set to Chat Completion mode, not Text Completion / Instruct mode.
This model was trained with asterisks (*) as content delimiters for both reasoning and the final response. This has a critical inference implication:
The opening channel token <|channel>thought is a special token that gets masked during training. As a result, the model cannot generate it on its own — it must be injected as a prefill at generation time.
You must configure your inference setup to prefill the model's turn with:
<|channel>thought\n** (only prefill an asterisk).Without this prefill, the model will either fail to reason or produce malformed output. Most frontends support this via a "Start Reply With" or "Output Prefix" setting. See the SillyTavern settings files for a working configuration.
When using an OpenAI-compatible API, you must add custom parameters to control thinking and ensure correct token handling. In frontends like SillyTavern, add the following JSON to the "Custom Parameters" (or extra_body) field in your connection settings:
"chat_template_kwargs": {"enable_thinking": true}
"skip_special_tokens": false
"enable_thinking": true: Activates the model's chain-of-thought reasoning. Change to false to disable it for faster, more direct responses."skip_special_tokens": false: This is critical. It prevents the API from stripping away the special tokens (like <|turn|> and <|channel|>) that are required for the model's chat template to work correctly.Example command:
vllm serve Darkhn/Gemma-4-31B-Animus-V14.0 \
--tensor-parallel-size 4 \
--max-model-len auto \
--cudagraph-capture-sizes 1 \
--port 8000 \
--gpu-memory-utilization 0.95 \
--enable-prefix-caching \
--enable-auto-tool-choice \
--tool-call-parser gemma4 \
--limit-mm-per-prompt '{"image":0,"audio":0}' \
--async-scheduling \
--max-num-seqs 1 \
--enable-chunked-prefill \
--disable-custom-all-reduce \
--api-key YOURKEY \
--trust-remote-code
Note: Do not use --reasoning-parser gemma4 unless your frontend supports the reasoning_content field in the API response. Most frontends (including SillyTavern) handle reasoning stripping themselves using the prefix/suffix configured in their settings.
enable_thinking to select at runtime.<|think|> token in the system turn is what activates chain-of-thought reasoning at the architecture level.The quantized model files are available for download. Click the button below to view the files.
Download GGUF Files →For the best roleplaying experience, it is highly recommended to use the provided character card and lore book. These files help guide the model's persona and provide rich, in-universe context.
Download Files →For a seamless setup in SillyTavern, you can download pre-configured sampler presets. These are tuned to provide an optimal balance between creativity and narrative coherence for this model.
Simply download the .json file below and import it into SillyTavern's sampler presets menu.
Temperature: 1.0
Min P: 0.02
For the best results, use this structured format. This helps the AI clearly distinguish between actions, inner thoughts, and dialogue.
*He walked across the room and stared out the window.**-I wonder what she's thinking.-*Alex (Curious): "What do you see out there?"Standard novel-style formatting is also understood, but this structured format is preferred for clarity.
Click the button below to view a full, unedited chatlog demonstrating the model's narrative style and character portrayal.
View Chatlog Example →This is Version 14.0, in the Animus series. V14.0 is built on google/gemma-4-31B-it, offering a massive leap in parameter count and underlying logic compared to previous versions.
V14.0's strength comes from a novel dataset designed to teach the model the why behind the lore, not just the what. The training data has been heavily expanded for this version:
The result is a model with exceptionally strong prose and a deep grasp of in-universe lore, making for a highly immersive and accurate roleplaying experience.
Note for roleplay, it follows system prompt and first message, meaning if the first assistant message is short, the following messages will be short.
V14.0 marks a shift from model merging to a focused, direct fine-tuning approach. This allows for greater control over the final model's characteristics.
The V14.0 dataset has been significantly expanded from previous versions:
All datasets underwent a rigorous cleaning process to remove formatting artifacts, resulting in a cleaner and more natural narrative style.
**scene transitions**. The model should now produce cleaner prose.