SoulSplats LAM LoRA v1 β Diverse Gaussian Splat Avatars
LoRA adapter weights for the LAM (Large Avatar Model) transformer decoder, finetuned on ethnically balanced face data for improved avatar quality across all skin tones.
Model Details
| Property |
Value |
| Base model |
LAM (aigc3d/LAM, SIGGRAPH 2025) β DINOv2 ViT-L/14 encoder + 10-layer transformer decoder + 20K Gaussian splatting |
| Adapter type |
LoRA (Low-Rank Adaptation) |
| LoRA rank |
16 |
| LoRA alpha |
16.0 |
| Trainable params |
1.97M / 559M total (0.35%) |
| Layers adapted |
60 attention projection layers (Q, K, V, output) in transformer decoder |
| Precision |
bf16 mixed precision |
| Framework |
PyTorch |
Training Details
Data
- Primary dataset: FairFace β 108K images balanced across 7 race groups
- Curation: 3,000 images balanced to 500 per Fitzpatrick skin type (I-VI) using ITA (Individual Typology Angle) classification
- Preprocessing: Center-cropped to square, resized to 512x512, mid-grey (0.5) background compositing to reduce luminance halo artifacts on dark skin tones
- FLAME tracking: Default neutral FLAME shape parameters (zero-initialized, 300-dim full PCA space)
Fitzpatrick Distribution
| Type |
Count |
Skin Tone |
| I |
500 |
Very light |
| II |
500 |
Light |
| III |
500 |
Medium light |
| IV |
500 |
Medium |
| V |
500 |
Medium dark |
| VI |
500 |
Dark |
Training Configuration
| Parameter |
Value |
| Optimizer |
AdamW (beta1=0.9, beta2=0.95) |
| Learning rate |
1e-4 with cosine decay |
| Warmup steps |
500 |
| Weight decay |
0.01 |
| Batch size |
1 (gradient accumulation 8, effective batch 8) |
| Epochs |
20 |
| Total steps |
6,890 |
| Training time |
~40 hours on RTX 3080 Ti (12GB) |
| Grad clipping |
1.0 max norm |
| Gradient checkpointing |
Enabled |
Loss Function
| Component |
Weight |
Description |
| Masked L1 pixel |
1.0 |
Photometric reconstruction in face region |
| LPIPS perceptual (VGG) |
1.0 |
Perceptual similarity via learned features |
| Mask coverage |
0.5 |
Penalizes rendering outside foreground mask |
| Offset regularization |
0.1 |
Prevents Gaussian drift from FLAME mesh |
Training Metrics
| Metric |
Value |
| Final loss |
0.966 |
| Best loss |
0.638 |
| Best pixel L1 |
0.104 |
| Best perceptual (LPIPS) |
0.492 |
| Epoch mean (final) |
1.003 |
| Epoch mean (best, epoch 16) |
0.982 |
Usage
Loading with SoulSplats pipeline
The weights are automatically loaded when placed at the default path:
mkdir -p checkpoints/lora_latest
cp lora_weights.pt checkpoints/lora_latest/lora_weights.pt
export LAM_LORA_WEIGHTS=/path/to/lora_weights.pt
Loading programmatically
from scripts.finetune_lora import load_lora_weights
model = load_lam_model()
load_lora_weights(model, "lora_weights.pt", rank=16, alpha=16.0)
Standalone LoRA application
import torch
from scripts.finetune_lora import LoRALinear, apply_lora_to_model
lora_layers = apply_lora_to_model(model, rank=16, alpha=16.0)
weights = torch.load("lora_weights.pt", map_location="cpu", weights_only=True)
for name, layer in lora_layers.items():
layer.lora_A.data = weights[f"{name}.lora_A"].to(layer.lora_A.device)
layer.lora_B.data = weights[f"{name}.lora_B"].to(layer.lora_B.device)
Files
| File |
Size |
Description |
lora_weights.pt |
7.6 MB |
LoRA A/B matrices for all 60 adapted layers |
config.json |
599 B |
Training configuration |
training_config.yaml |
3.9 KB |
Full training config with resolution upgrade path |
training_log.jsonl |
~25 KB |
Per-step training metrics (loss, lr, etc.) |
Intended Use
- One-shot 3D avatar generation from a single face photo
- Improved quality and fairness across diverse skin tones and ethnicities
- Real-time animatable Gaussian splat avatars with FLAME rigging
- Research on equitable 3D face reconstruction
Limitations
- Trained with neutral FLAME expression (zero-initialized) β does not improve expression diversity
- 512x512 input resolution β higher resolution requires Phase 2/3 training on larger GPUs
- ~20K Gaussians (FLAME subdivide=1) β Phase 3 targets 60K for finer detail
- Validation loss metric was non-functional during this run (identity camera matrix issue) β training loss and visual quality confirmed improvement
Ethical Considerations
This model was specifically designed to reduce racial bias in 3D avatar generation:
- Training data explicitly balanced across Fitzpatrick skin types I-VI
- Mid-grey background compositing to reduce luminance halo artifacts on dark skin
- Adaptive TTO (Test-Time Optimization) that automatically extends refinement iterations for underrepresented features
- Benchmark harness includes per-Fitzpatrick fairness metrics
Citation
@misc{soulsplats-lora-v1,
title={SoulSplats LAM LoRA v1: Diverse Gaussian Splat Avatar Adapter},
author={AceofSpade81},
year={2026},
note={LoRA adapter for LAM (Large Avatar Model) trained on FairFace for ethnic diversity}
}
Acknowledgements