SoulSplats LAM LoRA v1 — Diverse Gaussian Splat Avatars

LoRA adapter weights for the LAM (Large Avatar Model) transformer decoder, finetuned on ethnically balanced face data for improved avatar quality across all skin tones.

Model Details

Property	Value
Base model	LAM (aigc3d/LAM, SIGGRAPH 2025) — DINOv2 ViT-L/14 encoder + 10-layer transformer decoder + 20K Gaussian splatting
Adapter type	LoRA (Low-Rank Adaptation)
LoRA rank	16
LoRA alpha	16.0
Trainable params	1.97M / 559M total (0.35%)
Layers adapted	60 attention projection layers (Q, K, V, output) in transformer decoder
Precision	bf16 mixed precision
Framework	PyTorch

Training Details

Data

Primary dataset: FairFace — 108K images balanced across 7 race groups
Curation: 3,000 images balanced to 500 per Fitzpatrick skin type (I-VI) using ITA (Individual Typology Angle) classification
Preprocessing: Center-cropped to square, resized to 512x512, mid-grey (0.5) background compositing to reduce luminance halo artifacts on dark skin tones
FLAME tracking: Default neutral FLAME shape parameters (zero-initialized, 300-dim full PCA space)

Fitzpatrick Distribution

Type	Count	Skin Tone
I	500	Very light
II	500	Light
III	500	Medium light
IV	500	Medium
V	500	Medium dark
VI	500	Dark

Training Configuration

Parameter	Value
Optimizer	AdamW (beta1=0.9, beta2=0.95)
Learning rate	1e-4 with cosine decay
Warmup steps	500
Weight decay	0.01
Batch size	1 (gradient accumulation 8, effective batch 8)
Epochs	20
Total steps	6,890
Training time	~40 hours on RTX 3080 Ti (12GB)
Grad clipping	1.0 max norm
Gradient checkpointing	Enabled

Loss Function

Component	Weight	Description
Masked L1 pixel	1.0	Photometric reconstruction in face region
LPIPS perceptual (VGG)	1.0	Perceptual similarity via learned features
Mask coverage	0.5	Penalizes rendering outside foreground mask
Offset regularization	0.1	Prevents Gaussian drift from FLAME mesh

Training Metrics

Metric	Value
Final loss	0.966
Best loss	0.638
Best pixel L1	0.104
Best perceptual (LPIPS)	0.492
Epoch mean (final)	1.003
Epoch mean (best, epoch 16)	0.982

Usage

Loading with SoulSplats pipeline

The weights are automatically loaded when placed at the default path:

# Copy to default location
mkdir -p checkpoints/lora_latest
cp lora_weights.pt checkpoints/lora_latest/lora_weights.pt

# Or set environment variable
export LAM_LORA_WEIGHTS=/path/to/lora_weights.pt

Loading programmatically

from scripts.finetune_lora import load_lora_weights

# After loading the base LAM model
model = load_lam_model()  # Your LAM loading function
load_lora_weights(model, "lora_weights.pt", rank=16, alpha=16.0)

# Model is now ready for inference with LoRA applied

Standalone LoRA application

import torch
from scripts.finetune_lora import LoRALinear, apply_lora_to_model

# Apply LoRA structure to model
lora_layers = apply_lora_to_model(model, rank=16, alpha=16.0)

# Load weights
weights = torch.load("lora_weights.pt", map_location="cpu", weights_only=True)
for name, layer in lora_layers.items():
    layer.lora_A.data = weights[f"{name}.lora_A"].to(layer.lora_A.device)
    layer.lora_B.data = weights[f"{name}.lora_B"].to(layer.lora_B.device)

Files

File	Size	Description
`lora_weights.pt`	7.6 MB	LoRA A/B matrices for all 60 adapted layers
`config.json`	599 B	Training configuration
`training_config.yaml`	3.9 KB	Full training config with resolution upgrade path
`training_log.jsonl`	~25 KB	Per-step training metrics (loss, lr, etc.)

Intended Use

One-shot 3D avatar generation from a single face photo
Improved quality and fairness across diverse skin tones and ethnicities
Real-time animatable Gaussian splat avatars with FLAME rigging
Research on equitable 3D face reconstruction

Limitations

Trained with neutral FLAME expression (zero-initialized) — does not improve expression diversity
512x512 input resolution — higher resolution requires Phase 2/3 training on larger GPUs
~20K Gaussians (FLAME subdivide=1) — Phase 3 targets 60K for finer detail
Validation loss metric was non-functional during this run (identity camera matrix issue) — training loss and visual quality confirmed improvement

Ethical Considerations

This model was specifically designed to reduce racial bias in 3D avatar generation:

Training data explicitly balanced across Fitzpatrick skin types I-VI
Mid-grey background compositing to reduce luminance halo artifacts on dark skin
Adaptive TTO (Test-Time Optimization) that automatically extends refinement iterations for underrepresented features
Benchmark harness includes per-Fitzpatrick fairness metrics

Citation

@misc{soulsplats-lora-v1,
  title={SoulSplats LAM LoRA v1: Diverse Gaussian Splat Avatar Adapter},
  author={AceofSpade81},
  year={2026},
  note={LoRA adapter for LAM (Large Avatar Model) trained on FairFace for ethnic diversity}
}

Acknowledgements

LAM (Large Avatar Model) — Alibaba 3DAIGC Team, SIGGRAPH 2025
FairFace — Balanced race, gender, and age face dataset
FLAME — Parametric head model

Downloads last month: 11

Inference Providers NEW

Image-to-3D

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

AceofSpade81
/

soulsplats-lam-lora-v1