Instructions to use circlestone-labs/Anima with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusion Single File
How to use circlestone-labs/Anima with Diffusion Single File:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Lora character training
Good afternoon, I've encountered a problem. I'm training Lora character with different characters, and I've been training it on Illustrious/noobai, but yesterday I started trying to train Lora on the Anima standalone trainer, and I've encountered issues: the white background of the character doesn't disappear even if I increase the power of the promt, and if I increase it too much, it causes a problem similar to the one in the picture. Can anyone help me understand the issue? The training parameters were as follows:
gpu_ids = "0"
[training_arguments]
output_name = "Training_test_lora"
save_model_as = "safetensors"
max_train_epochs = 7
save_every_n_epochs = 1
sample_every_n_epochs = 1
log_with = "tensorboard"
learning_rate = 1
text_encoder_lr = 1
optimizer_type = "Prodigy"
optimizer_args = [ "weight_decay=0.01", "decouple=True" ]
lr_scheduler = "cosine"
lr_warmup_steps = 0
mixed_precision = "bf16"
save_precision = "bf16"
max_data_loader_n_workers = 2
gradient_accumulation_steps = 1
max_grad_norm = 1
gradient_checkpointing = true
flash_attn = false
torch_compile = false
lowram = false
blocks_to_swap = 0
persistent_data_loader_workers = true
seed = 42
cache_latents_to_disk = false
vae_batch_size = 1
cache_text_encoder_outputs_to_disk = false
disable_bucket_shuffle = true
multigpu_mode = "ddp"
deepspeed = false
use_cuda_direct = false
ddp_gradient_as_bucket_view = false
ddp_static_graph = false
use_fsdp = false
fsdp_sharding_strategy = "1"
fsdp_offload_params = false
fsdp_reshard_after_forward = false
fsdp_activation_checkpointing = false
fsdp_cpu_ram_efficient_loading = false
fsdp_backward_prefetch = ""
fsdp_forward_prefetch = false
fsdp_use_orig_params = true
fsdp_limit_all_gathers = true
fsdp_auto_wrap_policy = "NO_WRAP"
fsdp_min_num_params = 100_000_000
fsdp_transformer_layer_cls_to_wrap = ""
fsdp2_reshard_after_forward = true
fsdp2_offload_params = false
fsdp2_activation_checkpointing = false
fsdp2_cpu_ram_efficient_loading = false
fsdp2_auto_wrap_policy = "NO_WRAP"
fsdp2_min_num_params = 100_000_000
fsdp2_transformer_layer_cls_to_wrap = ""
step_profile = false
profile_microbatch = false
[network_arguments]
network_module = "networks.lora_anima"
network_dim = 24
network_alpha = 12
network_train_unet_only = true
network_dropout = 0.05
auto_resume_last_state = true
[anima_arguments]
timestep_sample_method = "logit_normal"
discrete_flow_shift = 3
weighting_scheme = "logit_normal"
You should probably stop using Prodigy as it's very likely using an LR that's way too high when Anima doesn't really want high LR in the first place (Prodigy almost always overshoots LR anyways and overfits hard). You should use AdamW/AdamW8bit instead
You should probably stop using Prodigy as it's very likely using an LR that's way too high when Anima doesn't really want high LR in the first place (Prodigy almost always overshoots LR anyways and overfits hard). You should use AdamW/AdamW8bit instead
Yes, I tried with AdamW/AdamW8bit, there was also such a problem, there was a white background and occasionally the right background skipped, also I do not understand a little bit if the markup of the dataset of characters differs from SDXL, I got used to training on prodigy/cosine and also the dataset by the usual habitual solution I cleaned from the background, is it considered an error?
I applied the markup based on a recommendation from a neural network based on comments from Reddit, without the clutter of Sdxl:
Perstest, 1girl, perstestOutf, long hair, blue eyes, large breasts, brown hair, long sleeves, hair ornament, cleavage, very long hair, pointy ears, star pointy earrings, blue choker, virtual youtuber, beads hairclip, beads necklace, white kneehighs, red bow, clothing cutout, thigh strap, blue dress, white socks, short dress, frilled dress, cleavage cutout, single thighhigh, puffy long sleeves, blue nails, dragon horns, asymmetrical legwear, star hair ornament, dragon tail, bridal garter, uneven legwear, single sock, frilled kneehighs, demon wings, low wings, heart ahoge, heart-shaped pupils, criss-cross straps, criss-cross halter, blue skirt, plaid skirt, frilled skirt, pleated skirt, white thighhighs, cowboy shot,
I applied the markup based on a recommendation from a neural network based on comments from Reddit, without the clutter of Sdxl:
Perstest, 1girl, perstestOutf, long hair, blue eyes, large breasts, brown hair, long sleeves, hair ornament, cleavage, very long hair, pointy ears, star pointy earrings, blue choker, virtual youtuber, beads hairclip, beads necklace, white kneehighs, red bow, clothing cutout, thigh strap, blue dress, white socks, short dress, frilled dress, cleavage cutout, single thighhigh, puffy long sleeves, blue nails, dragon horns, asymmetrical legwear, star hair ornament, dragon tail, bridal garter, uneven legwear, single sock, frilled kneehighs, demon wings, low wings, heart ahoge, heart-shaped pupils, criss-cross straps, criss-cross halter, blue skirt, plaid skirt, frilled skirt, pleated skirt, white thighhighs, cowboy shot,
You can use AdamW and lower the learning rate to around 0.00002. Tagging can be described using natural language. With an image repeat count of 10, train for 10 epochs; if it underfits, continue training. When tagging, you need to describe the background as well, otherwise the model will treat the background as part of the character's features. Here are some personal training suggestions: it is recommended to train on a Linux system, which can nearly double the speed. Using a mixed resolution of 512, 768, and 1024 for training can also speed up the process, and the results are almost identical to training at 1024 resolution.
Below is the tagging prompt I personally use for large language models, which you can modify according to your needs. I personally use Gemini for tagging. You can download the accompanying script from my repository "kongbai-84/soultide_lora" at the "main" branch, and ask the large language model to explain how to use it and translate the interface into English.
Use English natural language to tag the characters in the images for the training set of an anime-specialized LoRA model. The specific rules are as follows:
- [Clothing Description] Each piece of clothing corresponds to only one independent tag, separated by English commas (e.g., white blouse, black pleated skirt, knee-high socks). Prohibit the use of wearing verbs such as "wearing", "dressed in", or "putting on"; instead, use noun phrases or prepositional phrases for direct descriptions (e.g., red scarf around neck, gloves on hands). Avoid general terms (like shoes, clothes) and use precise descriptions instead (like white platform sneakers, sheer stockings). Keep only the most appropriate one among synonymous tags for the same type of clothing, without repetition. One piece of clothing/accessory corresponds to exactly one prompt word. As long as it is a piece of clothing, it corresponds to only one prompt word. Do not use different prompt words just because the perspective changes while the clothing itself hasn't changed (do not change the prompt word for a specific piece of clothing, but also do not forcefully apply the complete prompt words to images where the corresponding clothing does not appear; adjust according to the visual cropping of the image).
- [Directional Description] Allow the use of spatial directional words for auxiliary positioning, such as on the left arm, around the waist, on the right wrist, etc.
- [Other Content] Retain tags describing character features (hair color, hairstyle, eye color, pupil shape, etc.), actions, and backgrounds. Describe actions using natural language. Add "@lhcx" at the very beginning of the prompt. Use natural language to describe the art style (do not use vague descriptions like "anime" or "exquisite anime illustration"; describe the style, painting method, and brushstrokes in detail). Keep only the most accurate synonym for character features. If the uploaded tags do not match the image or there are omissions, supplement or correct them based on the image content. When summarizing the character's face shape, put it in the "Others" category.
Use natural language to describe the character's actions in detail. - The prompt format should be: Art style, a XXX picture of a girl named XXX, a girl named XXX has XXX appearance, a girl named XXX has XXX clothing, a girl named XXX performs XXX action. Replace XXX with the character's name and append the "(soul tide)" suffix. Include this suffix in the summary as well. Prioritize following the user's new instructions.
- [Summary] After completing the above organization, gather and arrange all clothing-related tags together.
IMPORTANT: Please strictly maintain the following format for your reply. Do not change the file name marker, and place it in a code block to prevent the '#' symbol from being swallowed, so that I can write it back to the file via a script:
FILE: filename.txt
tag1, tag2, tag3...
Output the tagging/modification logic first, then output the prompt words, and finally summarize the character and clothing features in the following format:
Character Features:
Clothing Features:
Props (leave blank if none):
Others:
Place the modified tags in a code block, and output the summary directly.
Note that classifications like expressions should be placed in "Others". Check to ensure the tags in the Plaintext are consistent with those in the summary, avoiding situations where tags exist in the summary but not in the Plaintext, or vice versa.
When tagging, use natural language to describe the character's appearance and clothing in detail. The character features in the summary should not include perspective. Expressions (eye color belongs to Character Features, while closed eyes belongs to Others) should be placed in "Others", breast size should be placed in "Others", and traits like a mole on the breast should be placed in "Character Features".
When summarizing, the character, clothing, and props should not carry the prefix "a girl named XXX"; only place the character's name at the beginning of the character summary.
I applied the markup based on a recommendation from a neural network based on comments from Reddit, without the clutter of Sdxl:
Perstest, 1girl, perstestOutf, long hair, blue eyes, large breasts, brown hair, long sleeves, hair ornament, cleavage, very long hair, pointy ears, star pointy earrings, blue choker, virtual youtuber, beads hairclip, beads necklace, white kneehighs, red bow, clothing cutout, thigh strap, blue dress, white socks, short dress, frilled dress, cleavage cutout, single thighhigh, puffy long sleeves, blue nails, dragon horns, asymmetrical legwear, star hair ornament, dragon tail, bridal garter, uneven legwear, single sock, frilled kneehighs, demon wings, low wings, heart ahoge, heart-shaped pupils, criss-cross straps, criss-cross halter, blue skirt, plaid skirt, frilled skirt, pleated skirt, white thighhighs, cowboy shot,You can use AdamW and lower the learning rate to around 0.00002. Tagging can be described using natural language. With an image repeat count of 10, train for 10 epochs; if it underfits, continue training. When tagging, you need to describe the background as well, otherwise the model will treat the background as part of the character's features. Here are some personal training suggestions: it is recommended to train on a Linux system, which can nearly double the speed. Using a mixed resolution of 512, 768, and 1024 for training can also speed up the process, and the results are almost identical to training at 1024 resolution.
Below is the tagging prompt I personally use for large language models, which you can modify according to your needs. I personally use Gemini for tagging. You can download the accompanying script from my repository "kongbai-84/soultide_lora" at the "main" branch, and ask the large language model to explain how to use it and translate the interface into English.
Use English natural language to tag the characters in the images for the training set of an anime-specialized LoRA model. The specific rules are as follows:
- [Clothing Description] Each piece of clothing corresponds to only one independent tag, separated by English commas (e.g., white blouse, black pleated skirt, knee-high socks). Prohibit the use of wearing verbs such as "wearing", "dressed in", or "putting on"; instead, use noun phrases or prepositional phrases for direct descriptions (e.g., red scarf around neck, gloves on hands). Avoid general terms (like shoes, clothes) and use precise descriptions instead (like white platform sneakers, sheer stockings). Keep only the most appropriate one among synonymous tags for the same type of clothing, without repetition. One piece of clothing/accessory corresponds to exactly one prompt word. As long as it is a piece of clothing, it corresponds to only one prompt word. Do not use different prompt words just because the perspective changes while the clothing itself hasn't changed (do not change the prompt word for a specific piece of clothing, but also do not forcefully apply the complete prompt words to images where the corresponding clothing does not appear; adjust according to the visual cropping of the image).
- [Directional Description] Allow the use of spatial directional words for auxiliary positioning, such as on the left arm, around the waist, on the right wrist, etc.
- [Other Content] Retain tags describing character features (hair color, hairstyle, eye color, pupil shape, etc.), actions, and backgrounds. Describe actions using natural language. Add "@lhcx" at the very beginning of the prompt. Use natural language to describe the art style (do not use vague descriptions like "anime" or "exquisite anime illustration"; describe the style, painting method, and brushstrokes in detail). Keep only the most accurate synonym for character features. If the uploaded tags do not match the image or there are omissions, supplement or correct them based on the image content. When summarizing the character's face shape, put it in the "Others" category.
Use natural language to describe the character's actions in detail.- The prompt format should be: Art style, a XXX picture of a girl named XXX, a girl named XXX has XXX appearance, a girl named XXX has XXX clothing, a girl named XXX performs XXX action. Replace XXX with the character's name and append the "(soul tide)" suffix. Include this suffix in the summary as well. Prioritize following the user's new instructions.
- [Summary] After completing the above organization, gather and arrange all clothing-related tags together.
IMPORTANT: Please strictly maintain the following format for your reply. Do not change the file name marker, and place it in a code block to prevent the '#' symbol from being swallowed, so that I can write it back to the file via a script:
FILE: filename.txt
tag1, tag2, tag3...
Output the tagging/modification logic first, then output the prompt words, and finally summarize the character and clothing features in the following format:
Character Features:
Clothing Features:
Props (leave blank if none):
Others:Place the modified tags in a code block, and output the summary directly.
Note that classifications like expressions should be placed in "Others". Check to ensure the tags in the Plaintext are consistent with those in the summary, avoiding situations where tags exist in the summary but not in the Plaintext, or vice versa.
When tagging, use natural language to describe the character's appearance and clothing in detail. The character features in the summary should not include perspective. Expressions (eye color belongs to Character Features, while closed eyes belongs to Others) should be placed in "Others", breast size should be placed in "Others", and traits like a mole on the breast should be placed in "Character Features".
When summarizing, the character, clothing, and props should not carry the prefix "a girl named XXX"; only place the character's name at the beginning of the character summary.
Thanks for the reply. I'll try to rework my training approach, but I still don't understand the tag description. Was it a prompt for "labeling" or a guide on how to label a dataset yourself?
I'm still trying to find training parameters, as I've got transfer results so far, but things like "jewelry" often transfer incorrectly. I'm also still having problems with the transfer itself: the background behind the character stubbornly doesn't change, even when I specify something like "girl standing in bathroom." It often makes the background white or partially changes, mixing in a plain background. I've also encountered a problem I'm trying to solve: clothing sticking to the character. For example, my character has three poses: naked, in a dress, and in a sweater. During subsequent generations, the character is generated in the correct pose, but now wearing clothing that shouldn't be there according to the prompt. I also can't understand why the Anima model stubbornly pushes any censorship on my picture. Even if I write, for example, "the character changes clothes in the shower," it pushes censorship on my chest... and it would be fine if this happened on a woman's chest, but sometimes it also covers a man's chest with a haze HD, so I don't yet know how to deal with the censorship and the problem with training.
The part I provided is a prompt intended for Large Language Models. You can send that prompt along with your images to Gemini or other multimodal models with vision capabilities, and retrain using the tags generated by the LLM. This should solve the background and clothing overfitting issues you are encountering. Adding tags like "nsfw" and "uncensored" and increasing their weights to 2-8 should help avoid the issue of constantly getting censored images. Alternatively, you could try changing the reference artist. Constantly generating censored images is usually caused by overfitting, which happens when all of the artist's works in the dataset are censored.
I applied the markup based on a recommendation from a neural network based on comments from Reddit, without the clutter of Sdxl:
Perstest, 1girl, perstestOutf, long hair, blue eyes, large breasts, brown hair, long sleeves, hair ornament, cleavage, very long hair, pointy ears, star pointy earrings, blue choker, virtual youtuber, beads hairclip, beads necklace, white kneehighs, red bow, clothing cutout, thigh strap, blue dress, white socks, short dress, frilled dress, cleavage cutout, single thighhigh, puffy long sleeves, blue nails, dragon horns, asymmetrical legwear, star hair ornament, dragon tail, bridal garter, uneven legwear, single sock, frilled kneehighs, demon wings, low wings, heart ahoge, heart-shaped pupils, criss-cross straps, criss-cross halter, blue skirt, plaid skirt, frilled skirt, pleated skirt, white thighhighs, cowboy shot,You can use AdamW and lower the learning rate to around 0.00002. Tagging can be described using natural language. With an image repeat count of 10, train for 10 epochs; if it underfits, continue training. When tagging, you need to describe the background as well, otherwise the model will treat the background as part of the character's features. Here are some personal training suggestions: it is recommended to train on a Linux system, which can nearly double the speed. Using a mixed resolution of 512, 768, and 1024 for training can also speed up the process, and the results are almost identical to training at 1024 resolution.
Below is the tagging prompt I personally use for large language models, which you can modify according to your needs. I personally use Gemini for tagging. You can download the accompanying script from my repository "kongbai-84/soultide_lora" at the "main" branch, and ask the large language model to explain how to use it and translate the interface into English.
Use English natural language to tag the characters in the images for the training set of an anime-specialized LoRA model. The specific rules are as follows:
- [Clothing Description] Each piece of clothing corresponds to only one independent tag, separated by English commas (e.g., white blouse, black pleated skirt, knee-high socks). Prohibit the use of wearing verbs such as "wearing", "dressed in", or "putting on"; instead, use noun phrases or prepositional phrases for direct descriptions (e.g., red scarf around neck, gloves on hands). Avoid general terms (like shoes, clothes) and use precise descriptions instead (like white platform sneakers, sheer stockings). Keep only the most appropriate one among synonymous tags for the same type of clothing, without repetition. One piece of clothing/accessory corresponds to exactly one prompt word. As long as it is a piece of clothing, it corresponds to only one prompt word. Do not use different prompt words just because the perspective changes while the clothing itself hasn't changed (do not change the prompt word for a specific piece of clothing, but also do not forcefully apply the complete prompt words to images where the corresponding clothing does not appear; adjust according to the visual cropping of the image).
- [Directional Description] Allow the use of spatial directional words for auxiliary positioning, such as on the left arm, around the waist, on the right wrist, etc.
- [Other Content] Retain tags describing character features (hair color, hairstyle, eye color, pupil shape, etc.), actions, and backgrounds. Describe actions using natural language. Add "@lhcx" at the very beginning of the prompt. Use natural language to describe the art style (do not use vague descriptions like "anime" or "exquisite anime illustration"; describe the style, painting method, and brushstrokes in detail). Keep only the most accurate synonym for character features. If the uploaded tags do not match the image or there are omissions, supplement or correct them based on the image content. When summarizing the character's face shape, put it in the "Others" category.
Use natural language to describe the character's actions in detail.- The prompt format should be: Art style, a XXX picture of a girl named XXX, a girl named XXX has XXX appearance, a girl named XXX has XXX clothing, a girl named XXX performs XXX action. Replace XXX with the character's name and append the "(soul tide)" suffix. Include this suffix in the summary as well. Prioritize following the user's new instructions.
- [Summary] After completing the above organization, gather and arrange all clothing-related tags together.
IMPORTANT: Please strictly maintain the following format for your reply. Do not change the file name marker, and place it in a code block to prevent the '#' symbol from being swallowed, so that I can write it back to the file via a script:
FILE: filename.txt
tag1, tag2, tag3...
Output the tagging/modification logic first, then output the prompt words, and finally summarize the character and clothing features in the following format:
Character Features:
Clothing Features:
Props (leave blank if none):
Others:Place the modified tags in a code block, and output the summary directly.
Note that classifications like expressions should be placed in "Others". Check to ensure the tags in the Plaintext are consistent with those in the summary, avoiding situations where tags exist in the summary but not in the Plaintext, or vice versa.
When tagging, use natural language to describe the character's appearance and clothing in detail. The character features in the summary should not include perspective. Expressions (eye color belongs to Character Features, while closed eyes belongs to Others) should be placed in "Others", breast size should be placed in "Others", and traits like a mole on the breast should be placed in "Character Features".
When summarizing, the character, clothing, and props should not carry the prefix "a girl named XXX"; only place the character's name at the beginning of the character summary.Thanks for the reply. I'll try to rework my training approach, but I still don't understand the tag description. Was it a prompt for "labeling" or a guide on how to label a dataset yourself?
I'm still trying to find training parameters, as I've got transfer results so far, but things like "jewelry" often transfer incorrectly. I'm also still having problems with the transfer itself: the background behind the character stubbornly doesn't change, even when I specify something like "girl standing in bathroom." It often makes the background white or partially changes, mixing in a plain background. I've also encountered a problem I'm trying to solve: clothing sticking to the character. For example, my character has three poses: naked, in a dress, and in a sweater. During subsequent generations, the character is generated in the correct pose, but now wearing clothing that shouldn't be there according to the prompt. I also can't understand why the Anima model stubbornly pushes any censorship on my picture. Even if I write, for example, "the character changes clothes in the shower," it pushes censorship on my chest... and it would be fine if this happened on a woman's chest, but sometimes it also covers a man's chest with a haze HD, so I don't yet know how to deal with the censorship and the problem with training.

You can try using a large language model for natural language tagging (the @lhcx at the beginning is not required).
My dataset specifically contains the following frame ratio: The dataset contains 46 images with transparent backgrounds: 9 cowboy images, 14 upper body images, 6 full body images, 5 lower body images, and 1 close-up body image.
I didn't mean that you should only train on close-ups of the face. You can use the same images multiple times by changing the framing. This way, besides artificially padding the dataset, you help the model focus on the specific elements you want to train. That's the whole point of cropping
Hello everyone, I've come to you with some research regarding my initial questions. First, I'm attaching the parameters I consider to be working in the Anima standalone Trainer. (P.S. I can't yet provide precise data on general parameters, but Prodigy + Kosine have some rather specific details in the Anima version...) These are the parameters that worked well when training models:
[training_arguments]
output_name = "ModelNameTestv15"
save_model_as = "safetensors"
max_train_epochs = 8 - 15
Batch Size = 2 or 4
save_every_n_epochs = 1
sample_every_n_epochs = 1
log_with = "tensorboard"
learning_rate = 0.2-0.5
text_encoder_lr = 0 or = LR
optimizer_type = "Prodigy"
optimizer_args = [ "weight_decay=0.01", "decouple=True" ]
lr_scheduler = "cosine"
lr_warmup_steps = 80 - 100
mixed_precision = "bf16"
save_precision = "bf16"
max_data_loader_n_workers = 4
gradient_accumulation_steps = 1
max_grad_norm = 1
gradient_checkpointing = true
flash_attn = true
torch_compile = false
lowram = false
blocks_to_swap = 0
persistent_data_loader_workers = true
seed = 42
cache_latents_to_disk = true
vae_batch_size = 1
cache_text_encoder_outputs_to_disk = false
multigpu_mode = "ddp"
deepspeed = false
use_cuda_direct = false
ddp_gradient_as_bucket_view = false
ddp_static_graph = false
use_fsdp = false
fsdp_sharding_strategy = "1"
fsdp_offload_params = false
fsdp_reshard_after_forward = false
fsdp_activation_checkpointing = false
fsdp_cpu_ram_efficient_loading = false
fsdp_backward_prefetch = ""
fsdp_forward_prefetch = false
fsdp_use_orig_params = true
fsdp_limit_all_gathers = true
fsdp_auto_wrap_policy = "NO_WRAP"
fsdp_min_num_params = 100_000_000
fsdp_transformer_layer_cls_to_wrap = ""
fsdp2_reshard_after_forward = true
fsdp2_offload_params = false
fsdp2_activation_checkpointing = false
fsdp2_cpu_ram_efficient_loading = false
fsdp2_auto_wrap_policy = "NO_WRAP"
fsdp2_min_num_params = 100_000_000
fsdp2_transformer_layer_cls_to_wrap = ""
step_profile = false
profile_microbatch = false
[network_arguments]
network_module = "networks.lora_anima"
network_dim = 32
network_alpha = 16
network_train_unet_only = true
auto_resume_last_state = true
[anima_arguments]
timestep_sample_method = "logit_normal"
discrete_flow_shift = 3
weighting_scheme = "logit_normal"
I'd also like to point out a couple of observations I made when training/using LoRa characters after using this method:
- (assumption) Due to the lack of Unet Only training, commands for editing appearances using Danbooru tags result in the model barely responding to Danbooru tags, but responding perfectly to NL tags. This means the model barely remembers the character's costumes, but does remember that they can be dressed/undressed to any degree.
- I can't yet try training on custom Checkpoint train/checkpoint merge, because when training on such models begins, the Dit layer is often missing. I can't yet confirm whether this is a problem for those who are training or a feature of Anima in general.
- Image output dependence on "Score"
To answer briefly, I noticed that without using Score 7-9 in positive prompts and Score 1-3 in negative prompts, the models behave quite strangely. For example, if you don't use models trained with my method, the model behaves quite responsively and conveniently, images are generated reliably, but there are distortions (it's a neural network, come on!!)... BUT if you use models trained with my method and the specified Score 6 in positive and negative prompts, backgrounds begin to appear and the character no longer looks so rough... - Upscale and Adetailer models
To summarize these points briefly: Adetailer models behave a little strangely, and the upscaler grains the characters.
To elaborate, sometimes Adetailer models manage to improve a seemingly lost piece, but they also very often like to ruin it... Perhaps this is due to the fact that I use Forge Neo, but for now I can say that the old models work, but I can't yet figure out whether it's the same or not. Better/worse than on the XL.
Upscaling is a different story. I tried different upscaling levels and models. For example, the model performed reliably with 1.5-1.85 upscaling. If I increase the step to ~1.87, the character outline becomes inconsistent and bald spots appear. A grainy effect also appears on the model. For example, if the model has fur or other "fluffy" details, the graininess is very visible. A fluffy fox tail becomes a "polka-dot tail," the same with sweaters and other fluffy items. - NSFW content
Testing this in this regard reveals some interesting issues, such as "ray of salvation or censorship fog." This effect doesn't happen constantly, but it often ruins shots, especially considering that I often want to try to recreate scenes from ecchi anime without censorship, only to be overtaken by a "ray of light" in the abyss of emptiness or fog appearing underwater...
For now, these are all my comments and observations. I tried to write constructively and to the point. Perhaps my research will help someone... I hope it does.
optimizer_type = "Prodigy"
optimizer_args = [ "weight_decay=0.01", "decouple=True" ]
lr_scheduler = "cosine"
lr_warmup_steps = 80 - 100
- Upscale and Adetailer models
To summarize these points briefly: Adetailer models behave a little strangely, and the upscaler grains the characters.
To elaborate, sometimes Adetailer models manage to improve a seemingly lost piece, but they also very often like to ruin it... Perhaps this is due to the fact that I use Forge Neo, but for now I can say that the old models work, but I can't yet figure out whether it's the same or not. Better/worse than on the XL.
Upscaling is a different story. I tried different upscaling levels and models. For example, the model performed reliably with 1.5-1.85 upscaling. If I increase the step to ~1.87, the character outline becomes inconsistent and bald spots appear. A grainy effect also appears on the model. For example, if the model has fur or other "fluffy" details, the graininess is very visible. A fluffy fox tail becomes a "polka-dot tail," the same with sweaters and other fluffy items.
Not only is using Prodigy on Anima not really justified because it inflates the lr, which the model absolutely hates, but you also add a warmup to that, which goes against purpose of an adaptive optimizer. If you don't trust the community's advice, at least look at the devs lora example, all the configs from diffusion-pipe are right there.
And I don't understand what kind of improvements from the adetailer model you're talking about, given that they don't do anything on their own besides object detection. If you set up the workflow correctly, any detailing turns out great, even at base resolutions. As for upscaling, hires fix above 1.5x is unstable, for higher values you need to use the tile method
Yeah, I agree with what @degurshaft said... Using Prodigy is a massive trap on Anima, you could get away with it on SDXL but on Anima the higher LR will just overfit your Loras constantly and make it much more annoying to train on. Also, the proper way to "tune" Prodigy is to have the LR set to 1 and you change the D Coefficient, using lower LRs is a much more hacky way that makes it even more of a pain (and this is on top of using alpha = half dim). In either case though, if you need to tune Prodigy... you should just be using AdamW8Bit, which is going to be better anyways, and as mentioned, will work with warmup properly + you can just set network dim and alpha to 16/16 and forget about it.
Also, you mentioned that the trainer lacks the UNet only option, but it is in your config under network arguments. Unless you're referring to the LLM adapter, which is disabled by default in sd-scripts even if you don't pass train_llm_adapter.
Yeah, I agree with what @degurshaft said... Using Prodigy is a massive trap on Anima, you could get away with it on SDXL but on Anima the higher LR will just overfit your Loras constantly and make it much more annoying to train on. Also, the proper way to "tune" Prodigy is to have the LR set to 1 and you change the D Coefficient, using lower LRs is a much more hacky way that makes it even more of a pain (and this is on top of using alpha = half dim). In either case though, if you need to tune Prodigy... you should just be using AdamW8Bit, which is going to be better anyways, and as mentioned, will work with warmup properly + you can just set network dim and alpha to 16/16 and forget about it.
It seems like you can't edit d_coef through the gui in the standalone trainer, and even looking under the hood at the toml file, I haven't seen it there. Come to think of it, I don't remember being able to tweak betas there either. Recalling my experience with Illustrious, I noticed a kinda difference between, for example, 0.9 0.99 and 0.9 0.99 0.999, so I wonder how this trainer handles it.
And btw, the reason training a lora on the merges didn't work was most likely because it had the model.diffusion_model layer key prefixes. Just renaming them should be enough to make everything work
It seems like you can't edit d_coef through the gui in the standalone trainer, and even looking under the hood at the toml file, I haven't seen it there. Come to think of it, I don't remember being able to tweak betas there either. Recalling my experience with Illustrious, I noticed a kinda difference between, for example, 0.9 0.99 and 0.9 0.99 0.999, so I wonder how this trainer handles it.
I would recommend switching to a trainer like this fork of Lora Easy Training Scripts instead which has all of these parameters exposed, or just edit the config generated by that trainer and pass it to sd-scripts directly. As for optimizer betas... I mean, you can mess with them I guess, but I've honestly never found any moment where it was actually necessary to change them over just the optimizer defaults. Overall I still really don't recommend using Prodigy, it's just not worth the effort when you have to tune parameters since you can also just do that with AdamW much easier
I would recommend switching to a trainer like this fork of Lora Easy Training Scripts instead which has all of these parameters exposed, or just edit the config generated by that trainer and pass it to sd-scripts directly. As for optimizer betas... I mean, you can mess with them I guess, but I've honestly never found any moment where it was actually necessary to change them over just the optimizer defaults. Overall I still really don't recommend using Prodigy, it's just not worth the effort when you have to tune parameters since you can also just do that with AdamW much easier
Don't even use the standalone anymore, I just switched to the pipe. That was more directed at the author of the discussion, since he seem to be using it
optimizer_type = "Prodigy"
optimizer_args = [ "weight_decay=0.01", "decouple=True" ]
lr_scheduler = "cosine"
lr_warmup_steps = 80 - 100
- Upscale and Adetailer models
To summarize these points briefly: Adetailer models behave a little strangely, and the upscaler grains the characters.
To elaborate, sometimes Adetailer models manage to improve a seemingly lost piece, but they also very often like to ruin it... Perhaps this is due to the fact that I use Forge Neo, but for now I can say that the old models work, but I can't yet figure out whether it's the same or not. Better/worse than on the XL.
Upscaling is a different story. I tried different upscaling levels and models. For example, the model performed reliably with 1.5-1.85 upscaling. If I increase the step to ~1.87, the character outline becomes inconsistent and bald spots appear. A grainy effect also appears on the model. For example, if the model has fur or other "fluffy" details, the graininess is very visible. A fluffy fox tail becomes a "polka-dot tail," the same with sweaters and other fluffy items.Not only is using Prodigy on Anima not really justified because it inflates the lr, which the model absolutely hates, but you also add a warmup to that, which goes against purpose of an adaptive optimizer. If you don't trust the community's advice, at least look at the devs lora example, all the configs from diffusion-pipe are right there.
And I don't understand what kind of improvements from the adetailer model you're talking about, given that they don't do anything on their own besides object detection. If you set up the workflow correctly, any detailing turns out great, even at base resolutions. As for upscaling, hires fix above 1.5x is unstable, for higher values you need to use the tile method
I listen to the community's opinion, but what can I do if, out of 20 training attempts on different configurations, only these settings gave me a stable result without losing key character features... I was planning to run an experiment with Adamw8bit this week, but I just shared the tests for people who might be interested, although I understand that I am far from professionals like you.

