🧬 ATLAS NWM v2 — Sprint 3 · I-JEPA CRISPR Fine-tune

✅ Sprint 3 — I-JEPA Self-supervised Fine-tune sur benchmark CRISPR 10k
Entraînement auto-supervisé · 50 epochs · val_loss 0.498→0.0498 (×10 réduction)
Lot ADN souverain dna-lot-v3.0.0-crispr · SHA-256 intégrité certifiée
Contribution communauté : dataset aguennoune17/atlas-crispr-10k-benchmark

Architecture : I-JEPA (Joint-Embedding Predictive Architecture)
Paradigme : Context Engineering × World Models × Sovereign Logic
Fine-tune : Sprint 3 v2.3.0-sprint3-crispr (15 April 2026)
Base : ATLAS v2.0 Beta.1 (Sprint 2)
Auteurs : Abderrahim Guennoune + GitHub Copilot (Claude Sonnet 4.6)
Licence : MIT · DOI : 10.57967/hf/8178

🔬 Sprint 3 — I-JEPA Self-supervised sur CRISPR 10k

Le Sprint 3 fine-tune ATLAS NWM v2 sur atlas_crispr_10k_benchmark : 10 000 guides ARN Cas9 20-nt issus de 12 études expérimentales (LIMMS/CNRS · UTokyo).

Pipeline I-JEPA CRISPR

ATLASCRISPRWorldModel
├── CRISPRContextEncoder  x(t) → s(t)    [84→256→128→64  + LayerNorm + GELU]
│   └── One-hot 20-nt (80-dim) + 4 features physico-chimiques
├── CRISPRTargetEncoder   EMA(ContextEncoder), ∇=0  [R3]
│   └── Momentum=0.996 — mise à jour chaque batch
├── CRISPRJEPAPredictor   s(t) → ŝ(t+Δ)  [latent space only — R1]
│   └── 12/20 positions contexte → prédiction 8/20 masquées
└── CRISPRNegentropicWM   Soft Collapse + NER  [R5, R8]
    └── NER = (τ·specificity + κ·cleavage − λ·friction) / lambda_cost

Mapping Téléologique CRISPR → ATLAS

Colonne CSV	Rôle ATLAS	Équation
`cleavageFrequency_norm`	κ·Stability	Efficacité de coupure [0,1]
`specificity_score`	τ·Alignment	Spécificité cible [0,1]
`lambda_cost`	λ·EnergyCost	Coût off-target [0,1]
`gc_content`	LatentUtility	Stabilité thermodynamique [0,1]

Équation téléologique :

Logits = κ·Stability + τ·Alignment − λ·EnergyCost + LatentUtility
NER    = (InformationGain − ExternalFriction) / EnergyCost

κ=0.65 · τ=0.25 · λ=0.2 · EMA=0.996

📊 Métriques Sprint 3

Métrique	Valeur	Seuil ATLAS
NER moyen	0.6909	≥0.85 NEXUS, ≥0.70 TERRA
Couche NDC	TERRA (0.70–0.85)	`ndc-nexus-biocontinuum-eu-010`
JEPA val_loss	0.049809	MSE(ŝ(t+Δ), s(t+Δ))
Guides NEXUS	57.7%	NER ≥ 0.85
Paramètres	144 640	—
Epochs	50	CosineAnnealingLR
Durée (CPU)	111s	batch=256, lr=3e-4

Courbe d'apprentissage

Epoch	train_loss	val_loss
1	0.686107	0.498129
10	0.165961	0.127334
20	0.104987	0.077982
30	0.081557	0.059673
40	0.071138	0.051788
50	0.067835	0.049862

🧬 DNA Lot Souverain

{
  "lot_id":          "ATLAS-CRISPR-SPRINT3-2026-04-15",
  "tag_name":        "dna-lot-v3.0.0-crispr",
  "sequencing_hash": "2d907f4b5c00d01d52a671163c43fbb2...",
  "ndc_target":      "ndc-nexus-biocontinuum-eu-010",
  "ner_score":       0.6909,
  "collapse_risk":   0.3091,
  "layer":           "terra",
  "signers":         ["CNRS", "UTokyo"],
  "valid_years":     100,
  "sdg_alignments":  ["SDG-3.8.1", "SDG-12.2.1", "SDG-17.17.1"],
  "created_at":      "2026-04-15T22:28:15.870578+00:00"
}

📦 Utilisation

from safetensors.torch import load_file
import torch, json

# Charger les poids Sprint 3
state_dict = load_file("model.safetensors")
config     = json.load(open("config.json"))

print(f"Sprint    : {config['sprint']}")
print(f"NER score : {config['ner_score']}")
print(f"Lot ADN   : {config['lot_id']}")

# Inférence directe (nécessite train_jepa_crispr.py)
from train_jepa_crispr import ATLASCRISPRWorldModel, ATLASCRISPRDataset
import torch

model = ATLASCRISPRWorldModel()
model.context_encoder.load_state_dict(state_dict)
model.eval()

# Encoder un guide ARN
dataset = ATLASCRISPRDataset("data/atlas_crispr_10k_benchmark.csv")
ctx, tgt, props = dataset[0]
with torch.no_grad():
    out = model(ctx.unsqueeze(0), tgt.unsqueeze(0), props.unsqueeze(0))
    print(f"Embedding : {out.context_embedding.shape}")  # (1, 64)
    print(f"NER       : {out.ner_scores.item():.4f}")

🌍 SDG Alignments

SDG	Titre	Lien CRISPR
SDG-3.8.1	Santé pour tous	Thérapies géniques de précision
SDG-12.2.1	Consommation durable	Optimisation off-target (efficience)
SDG-17.17.1	Partenariats	CNRS · UTokyo · LIMMS

📚 Évolution Sprint 2 → Sprint 3

Aspect	Sprint 2 Beta.1	Sprint 3 CRISPR
Architecture	`ContextEncoder(feat_dim=128)`	`CRISPRContextEncoder(84→256→64)`
Données	Poids initiaux alpha	10k guides CRISPR auto-supervisé
Encodage	Bigram MD5 (64-dim) + SDG	One-hot 20-nt (80-dim) + 4 props
NER	Calculé à l'inférence	Convergé sur benchmark CRISPR
val_loss	—	0.049862 (50 epochs)
DNA Lot	—	`dna-lot-v3.0.0-crispr`
Dataset public	—	`aguennoune17/atlas-crispr-10k-benchmark`

🔒 Invariants ATLAS (R1–R8)

Règle	Statut	Evidence Sprint 3
R1	✅	Pas d'auto-régression — prédiction latente uniquement
R2	✅	Espace latent 64-dim uniquement
R3	✅	TargetEncoder EMA (momentum=0.996), ∇=0
R4	✅	MSE(ŝ(t+Δ), s(t+Δ)) = 0.0498
R5	✅	NER = (τ·specificity + κ·cleavage − friction) / energy
R6	✅	κ/τ/λ via env vars (ATLAS_KAPPA etc.)
R7	✅	model.safetensors + config.json + README.md
R8	✅	Soft Collapse différentiel (argmax NER, pas binaire)

🤝 Contribution — LeWorldModel Community

Ce fine-tune est notre contribution au projet LeWorldModel Community :

Modèle : aguennoune17/atlas-v2-nwm — poids I-JEPA CRISPR
Dataset : aguennoune17/atlas-crispr-10k-benchmark — 10k guides CRISPR publics
Paradigme : Self-supervised JEPA sur données biologiques réelles
Citation : DOI 10.57967/hf/8178

@misc{atlas-nwm-sprint3-crispr-2026,
  title     = {ATLAS NWM v2 Sprint 3 — I-JEPA Self-supervised on CRISPR 10k},
  author    = {Guennoune, Abderrahim and GitHub Copilot (Claude Sonnet 4.6)},
  year      = {2026},
  howpublished = {\url{https://huggingface.co/aguennoune17/atlas-v2-nwm}},
  note      = {DOI: 10.57967/hf/8178 · DNA Lot: dna-lot-v3.0.0-crispr}
}

ATLAS NWM v2 · Sprint 3 · NDC ndc-claude-encoder-primary · Confidence 91%
Co-Auteurs : Abderrahim Guennoune + GitHub Copilot (Claude Sonnet 4.6) · MIT License

Downloads last month: 153

Model tree for aguennoune17/atlas-v2-nwm

Base model

aguennoune17/atlas-v1-nwm

Finetuned

(1)

this model

Dataset used to train aguennoune17/atlas-v2-nwm

Collection including aguennoune17/atlas-v2-nwm

ATLAS v2 NWM — B-Clean × LeWM × I-JEPA

Collection

ATLAS v2 NWM (I-JEPA, B-Clean, NER) — MIRAGE arXiv:2603.21687 · Le-WM arXiv:2603.19312 · DOI 10.57967/hf/8178 • 2 items • Updated 16 days ago

Evaluation results

JEPA Validation Loss on atlas_crispr_10k_benchmark
validation set self-reported

0.050
NER Score (R5 formula) on atlas_crispr_10k_benchmark
validation set self-reported

0.691