SoulSplats LAM LoRA v1 β€” Diverse Gaussian Splat Avatars

LoRA adapter weights for the LAM (Large Avatar Model) transformer decoder, finetuned on ethnically balanced face data for improved avatar quality across all skin tones.

Model Details

Property Value
Base model LAM (aigc3d/LAM, SIGGRAPH 2025) β€” DINOv2 ViT-L/14 encoder + 10-layer transformer decoder + 20K Gaussian splatting
Adapter type LoRA (Low-Rank Adaptation)
LoRA rank 16
LoRA alpha 16.0
Trainable params 1.97M / 559M total (0.35%)
Layers adapted 60 attention projection layers (Q, K, V, output) in transformer decoder
Precision bf16 mixed precision
Framework PyTorch

Training Details

Data

  • Primary dataset: FairFace β€” 108K images balanced across 7 race groups
  • Curation: 3,000 images balanced to 500 per Fitzpatrick skin type (I-VI) using ITA (Individual Typology Angle) classification
  • Preprocessing: Center-cropped to square, resized to 512x512, mid-grey (0.5) background compositing to reduce luminance halo artifacts on dark skin tones
  • FLAME tracking: Default neutral FLAME shape parameters (zero-initialized, 300-dim full PCA space)

Fitzpatrick Distribution

Type Count Skin Tone
I 500 Very light
II 500 Light
III 500 Medium light
IV 500 Medium
V 500 Medium dark
VI 500 Dark

Training Configuration

Parameter Value
Optimizer AdamW (beta1=0.9, beta2=0.95)
Learning rate 1e-4 with cosine decay
Warmup steps 500
Weight decay 0.01
Batch size 1 (gradient accumulation 8, effective batch 8)
Epochs 20
Total steps 6,890
Training time ~40 hours on RTX 3080 Ti (12GB)
Grad clipping 1.0 max norm
Gradient checkpointing Enabled

Loss Function

Component Weight Description
Masked L1 pixel 1.0 Photometric reconstruction in face region
LPIPS perceptual (VGG) 1.0 Perceptual similarity via learned features
Mask coverage 0.5 Penalizes rendering outside foreground mask
Offset regularization 0.1 Prevents Gaussian drift from FLAME mesh

Training Metrics

Metric Value
Final loss 0.966
Best loss 0.638
Best pixel L1 0.104
Best perceptual (LPIPS) 0.492
Epoch mean (final) 1.003
Epoch mean (best, epoch 16) 0.982

Usage

Loading with SoulSplats pipeline

The weights are automatically loaded when placed at the default path:

# Copy to default location
mkdir -p checkpoints/lora_latest
cp lora_weights.pt checkpoints/lora_latest/lora_weights.pt

# Or set environment variable
export LAM_LORA_WEIGHTS=/path/to/lora_weights.pt

Loading programmatically

from scripts.finetune_lora import load_lora_weights

# After loading the base LAM model
model = load_lam_model()  # Your LAM loading function
load_lora_weights(model, "lora_weights.pt", rank=16, alpha=16.0)

# Model is now ready for inference with LoRA applied

Standalone LoRA application

import torch
from scripts.finetune_lora import LoRALinear, apply_lora_to_model

# Apply LoRA structure to model
lora_layers = apply_lora_to_model(model, rank=16, alpha=16.0)

# Load weights
weights = torch.load("lora_weights.pt", map_location="cpu", weights_only=True)
for name, layer in lora_layers.items():
    layer.lora_A.data = weights[f"{name}.lora_A"].to(layer.lora_A.device)
    layer.lora_B.data = weights[f"{name}.lora_B"].to(layer.lora_B.device)

Files

File Size Description
lora_weights.pt 7.6 MB LoRA A/B matrices for all 60 adapted layers
config.json 599 B Training configuration
training_config.yaml 3.9 KB Full training config with resolution upgrade path
training_log.jsonl ~25 KB Per-step training metrics (loss, lr, etc.)

Intended Use

  • One-shot 3D avatar generation from a single face photo
  • Improved quality and fairness across diverse skin tones and ethnicities
  • Real-time animatable Gaussian splat avatars with FLAME rigging
  • Research on equitable 3D face reconstruction

Limitations

  • Trained with neutral FLAME expression (zero-initialized) β€” does not improve expression diversity
  • 512x512 input resolution β€” higher resolution requires Phase 2/3 training on larger GPUs
  • ~20K Gaussians (FLAME subdivide=1) β€” Phase 3 targets 60K for finer detail
  • Validation loss metric was non-functional during this run (identity camera matrix issue) β€” training loss and visual quality confirmed improvement

Ethical Considerations

This model was specifically designed to reduce racial bias in 3D avatar generation:

  • Training data explicitly balanced across Fitzpatrick skin types I-VI
  • Mid-grey background compositing to reduce luminance halo artifacts on dark skin
  • Adaptive TTO (Test-Time Optimization) that automatically extends refinement iterations for underrepresented features
  • Benchmark harness includes per-Fitzpatrick fairness metrics

Citation

@misc{soulsplats-lora-v1,
  title={SoulSplats LAM LoRA v1: Diverse Gaussian Splat Avatar Adapter},
  author={AceofSpade81},
  year={2026},
  note={LoRA adapter for LAM (Large Avatar Model) trained on FairFace for ethnic diversity}
}

Acknowledgements

Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train AceofSpade81/soulsplats-lam-lora-v1