Not-For-All-Audiences

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

ChenkinNoob-XL-V0.3-BETA

Beta builds are preview-only while training continues; current checkpoints do not reflect final quality.

Overview

ChenkinNoob-XL-V0.3-BETA is still in active training, with the dataset already extended through 2026-01-07. This beta focuses on robustness (dropout), smarter sampling (repeat strategy), and faster throughput (cached latents). Community testers should treat it as an experimental checkpoint while we gather feedback ahead of the stable V0.3 release.

Key Changes

1. Hierarchical Dropout System

Global Dropout: At configurable probabilities, the model drops entire character feature blocks, forcing better generalization when prompts underspecify protagonists.
Quality Metadata Dropout: Quality labels (e.g., aesthetic, excellent), resolution tags, year tags, newest/oldest, and safety tiers now have higher dropout likelihoods to prevent overfitting to canned descriptors.
Character & Artist Dropout: Character and artist tags randomly drop at low probabilities, improving resilience when users omit exact names.
General Tags: Generic descriptors (clothing, props, mood, etc.) also undergo light dropout to reduce repetition artifacts.

2. Repeat (re) Scheduling

Fine-grained repeat logic balances underrepresented and oversampled tags (especially characters/artists):

Image Count per Tag	Repeat Value
30–83	7
83–100	5
100–200	3
200–400	2
400–500	1 (unchanged)
> 500	Randomly drop excess entries toward ~500 images

This keeps niche characters/illustrators visible without letting popular tags dominate.

3. `cache_latents` Enabled

Latent caching is active across the training pipeline, reducing recomputation overhead.
Effective throughput is now ~4 days per epoch, a significant speed-up versus previous versions, enabling faster iteration on beta feedback.

4. Dataset Refresh

Training data now includes Danbooru and allied sources up to 2026-01-07, ensuring the beta reflects the latest style trends and character additions.

5. e621-specific Strategy

Because e621 content remains a small fraction of the corpus, the following selective training policy is applied to avoid skew:

Tag Group	Strategy
Artists	Always dropped (not trained)
Copyrights	50% probability to drop all related tags
Characters	20% probability to drop all
Species	Randomly drop 20% of tags
General	Randomly drop 70% of tags
`e621` tag	20% probability to drop
Resolution	70% probability to drop
Year	Always dropped (not trained)

The goal is to keep the model aware of e621-style cues without letting it override the mainstream anime distribution.

Early Strengths (Beta Highlights)

Single-tag hot characters: Popular characters (≥200 Danbooru images) frequently render correctly with just the character tag or only a few outfit tags—full caption copies are rarely needed.
Improved long-tail coverage: Niche characters and illustrators receive better attention thanks to the repeat scheduling, reducing the need for elaborate prompting.

(Two additional experiments are underway to further extend these strengths; see “Ongoing Experiments” below.)

Ongoing Experiments

We are running several research tracks in parallel to guide the path toward the stable V0.3+ releases:

Rectified Flow pilots exploring flow-based guidance to prioritize training samples dynamically.
FP8 training tests for faster, more memory-friendly fine-tunes.
SDXL distillation experiments targeting ~8-step generation for faster sampling.
Post-training aesthetic refinements to fine-tune visual taste profiles.

Usage Notes

Status: Beta preview only; expect rapid iteration and potential breaking changes.
Prompting: Continue using V0.2 seed prompts as a baseline, but pay attention to how dropout affects dependency on high-level descriptors.
Feedback: Report findings in the closed-beta Discord threads so we can prioritize fixes before shipping ChenkinNoob-XL-V0.3 stable.