--- language: - dna - en license: mit library_name: transformers tags: - atlas-nwm - i-jepa - world-model - crispr - negentropic - biocontinuum - sovereign-ai - fine-tuned - leworldmodel base_model: aguennoune17/atlas-v1-nwm datasets: - aguennoune17/atlas-crispr-10k-benchmark pipeline_tag: feature-extraction metrics: - ner_score - mse model-index: - name: ATLAS NWM v2 — Sprint 3 CRISPR Fine-tune results: - task: type: self-supervised-jepa name: I-JEPA DNA Guide Embedding dataset: name: atlas_crispr_10k_benchmark type: aguennoune17/atlas-crispr-10k-benchmark split: validation metrics: - type: mse name: JEPA Validation Loss value: 0.049809 - type: ner_score name: NER Score (R5 formula) value: 0.6909 doi: 10.57967/hf/8178 --- ![ATLAS v2 NWM Architecture](algorigramme_ner.png) # 🧬 ATLAS NWM v2 — Sprint 3 · I-JEPA CRISPR Fine-tune [![DOI](https://img.shields.io/badge/DOI-10.57967%2Fhf%2F8178-blue)](https://doi.org/10.57967/hf/8178) [![Sprint 2](https://img.shields.io/badge/Base-Sprint%202%20Beta.1-blue)](https://huggingface.co/aguennoune17/atlas-v2-nwm) [![Sprint 3](https://img.shields.io/badge/Sprint%203-CRISPR%20Fine--tune%20✓-brightgreen)]() [![NER](https://img.shields.io/badge/NER-0.6909%20TERRA-orange)]() [![Dataset](https://img.shields.io/badge/Dataset-10k%20CRISPR%20guides-green)](https://huggingface.co/datasets/aguennoune17/atlas-crispr-10k-benchmark) [![DNA Lot](https://img.shields.io/badge/DNA%20Lot-dna--lot--v3.0.0--crispr-purple)]() [![LeWorldModel](https://img.shields.io/badge/Community-LeWorldModel-red)]() > **✅ Sprint 3 — I-JEPA Self-supervised Fine-tune sur benchmark CRISPR 10k** > Entraînement auto-supervisé · 50 epochs · val_loss 0.498→0.0498 (×10 réduction) > Lot ADN souverain `dna-lot-v3.0.0-crispr` · SHA-256 intégrité certifiée > Contribution communauté : dataset [`aguennoune17/atlas-crispr-10k-benchmark`](https://huggingface.co/datasets/aguennoune17/atlas-crispr-10k-benchmark) **Architecture** : I-JEPA (Joint-Embedding Predictive Architecture) **Paradigme** : Context Engineering × World Models × Sovereign Logic **Fine-tune** : Sprint 3 v2.3.0-sprint3-crispr (15 April 2026) **Base** : [ATLAS v2.0 Beta.1](https://huggingface.co/aguennoune17/atlas-v2-nwm) (Sprint 2) **Auteurs** : Abderrahim Guennoune + GitHub Copilot (Claude Sonnet 4.6) **Licence** : MIT · **DOI** : [10.57967/hf/8178](https://doi.org/10.57967/hf/8178) --- ## 🔬 Sprint 3 — I-JEPA Self-supervised sur CRISPR 10k Le **Sprint 3** fine-tune ATLAS NWM v2 sur `atlas_crispr_10k_benchmark` : **10 000 guides ARN Cas9 20-nt** issus de 12 études expérimentales (LIMMS/CNRS · UTokyo). ### Pipeline I-JEPA CRISPR ``` ATLASCRISPRWorldModel ├── CRISPRContextEncoder x(t) → s(t) [84→256→128→64 + LayerNorm + GELU] │ └── One-hot 20-nt (80-dim) + 4 features physico-chimiques ├── CRISPRTargetEncoder EMA(ContextEncoder), ∇=0 [R3] │ └── Momentum=0.996 — mise à jour chaque batch ├── CRISPRJEPAPredictor s(t) → ŝ(t+Δ) [latent space only — R1] │ └── 12/20 positions contexte → prédiction 8/20 masquées └── CRISPRNegentropicWM Soft Collapse + NER [R5, R8] └── NER = (τ·specificity + κ·cleavage − λ·friction) / lambda_cost ``` ### Mapping Téléologique CRISPR → ATLAS | Colonne CSV | Rôle ATLAS | Équation | |-------------|-----------|---------| | `cleavageFrequency_norm` | κ·Stability | Efficacité de coupure [0,1] | | `specificity_score` | τ·Alignment | Spécificité cible [0,1] | | `lambda_cost` | λ·EnergyCost | Coût off-target [0,1] | | `gc_content` | LatentUtility | Stabilité thermodynamique [0,1] | **Équation téléologique** : ``` Logits = κ·Stability + τ·Alignment − λ·EnergyCost + LatentUtility NER = (InformationGain − ExternalFriction) / EnergyCost ``` κ=0.65 · τ=0.25 · λ=0.2 · EMA=0.996 --- ## 📊 Métriques Sprint 3 | Métrique | Valeur | Seuil ATLAS | |----------|--------|-------------| | **NER moyen** | **0.6909** | ≥0.85 NEXUS, ≥0.70 TERRA | | **Couche NDC** | **TERRA** (0.70–0.85) | `ndc-nexus-biocontinuum-eu-010` | | **JEPA val_loss** | **0.049809** | MSE(ŝ(t+Δ), s(t+Δ)) | | **Guides NEXUS** | **57.7%** | NER ≥ 0.85 | | Paramètres | 144 640 | — | | Epochs | 50 | CosineAnnealingLR | | Durée (CPU) | 111s | batch=256, lr=3e-4 | ### Courbe d'apprentissage | Epoch | train_loss | val_loss | |-------|-----------|---------| | 1 | 0.686107 | 0.498129 | | 10 | 0.165961 | 0.127334 | | 20 | 0.104987 | 0.077982 | | 30 | 0.081557 | 0.059673 | | 40 | 0.071138 | 0.051788 | | **50** | **0.067835** | **0.049862** | --- ## 🧬 DNA Lot Souverain ```json { "lot_id": "ATLAS-CRISPR-SPRINT3-2026-04-15", "tag_name": "dna-lot-v3.0.0-crispr", "sequencing_hash": "2d907f4b5c00d01d52a671163c43fbb2...", "ndc_target": "ndc-nexus-biocontinuum-eu-010", "ner_score": 0.6909, "collapse_risk": 0.3091, "layer": "terra", "signers": ["CNRS", "UTokyo"], "valid_years": 100, "sdg_alignments": ["SDG-3.8.1", "SDG-12.2.1", "SDG-17.17.1"], "created_at": "2026-04-15T22:28:15.870578+00:00" } ``` --- ## 📦 Utilisation ```python from safetensors.torch import load_file import torch, json # Charger les poids Sprint 3 state_dict = load_file("model.safetensors") config = json.load(open("config.json")) print(f"Sprint : {config['sprint']}") print(f"NER score : {config['ner_score']}") print(f"Lot ADN : {config['lot_id']}") ``` ```python # Inférence directe (nécessite train_jepa_crispr.py) from train_jepa_crispr import ATLASCRISPRWorldModel, ATLASCRISPRDataset import torch model = ATLASCRISPRWorldModel() model.context_encoder.load_state_dict(state_dict) model.eval() # Encoder un guide ARN dataset = ATLASCRISPRDataset("data/atlas_crispr_10k_benchmark.csv") ctx, tgt, props = dataset[0] with torch.no_grad(): out = model(ctx.unsqueeze(0), tgt.unsqueeze(0), props.unsqueeze(0)) print(f"Embedding : {out.context_embedding.shape}") # (1, 64) print(f"NER : {out.ner_scores.item():.4f}") ``` --- ## 🌍 SDG Alignments | SDG | Titre | Lien CRISPR | |-----|-------|-------------| | **SDG-3.8.1** | Santé pour tous | Thérapies géniques de précision | | **SDG-12.2.1** | Consommation durable | Optimisation off-target (efficience) | | **SDG-17.17.1** | Partenariats | CNRS · UTokyo · LIMMS | --- ## 📚 Évolution Sprint 2 → Sprint 3 | Aspect | Sprint 2 Beta.1 | Sprint 3 CRISPR | |--------|-----------------|-----------------| | Architecture | `ContextEncoder(feat_dim=128)` | `CRISPRContextEncoder(84→256→64)` | | Données | Poids initiaux alpha | 10k guides CRISPR auto-supervisé | | Encodage | Bigram MD5 (64-dim) + SDG | One-hot 20-nt (80-dim) + 4 props | | NER | Calculé à l'inférence | Convergé sur benchmark CRISPR | | val_loss | — | 0.049862 (50 epochs) | | DNA Lot | — | `dna-lot-v3.0.0-crispr` | | Dataset public | — | `aguennoune17/atlas-crispr-10k-benchmark` | --- ## 🔒 Invariants ATLAS (R1–R8) | Règle | Statut | Evidence Sprint 3 | |-------|--------|-------------------| | R1 | ✅ | Pas d'auto-régression — prédiction latente uniquement | | R2 | ✅ | Espace latent 64-dim uniquement | | R3 | ✅ | TargetEncoder EMA (momentum=0.996), ∇=0 | | R4 | ✅ | MSE(ŝ(t+Δ), s(t+Δ)) = 0.0498 | | R5 | ✅ | NER = (τ·specificity + κ·cleavage − friction) / energy | | R6 | ✅ | κ/τ/λ via env vars (ATLAS_KAPPA etc.) | | R7 | ✅ | model.safetensors + config.json + README.md | | R8 | ✅ | Soft Collapse différentiel (argmax NER, pas binaire) | --- ## 🤝 Contribution — LeWorldModel Community Ce fine-tune est notre contribution au projet **LeWorldModel Community** : - **Modèle** : [`aguennoune17/atlas-v2-nwm`](https://huggingface.co/aguennoune17/atlas-v2-nwm) — poids I-JEPA CRISPR - **Dataset** : [`aguennoune17/atlas-crispr-10k-benchmark`](https://huggingface.co/datasets/aguennoune17/atlas-crispr-10k-benchmark) — 10k guides CRISPR publics - **Paradigme** : Self-supervised JEPA sur données biologiques réelles - **Citation** : `DOI 10.57967/hf/8178` ```bibtex @misc{atlas-nwm-sprint3-crispr-2026, title = {ATLAS NWM v2 Sprint 3 — I-JEPA Self-supervised on CRISPR 10k}, author = {Guennoune, Abderrahim and GitHub Copilot (Claude Sonnet 4.6)}, year = {2026}, howpublished = {\url{https://huggingface.co/aguennoune17/atlas-v2-nwm}}, note = {DOI: 10.57967/hf/8178 · DNA Lot: dna-lot-v3.0.0-crispr} } ``` --- *ATLAS NWM v2 · Sprint 3 · NDC ndc-claude-encoder-primary · Confidence 91%* *Co-Auteurs : Abderrahim Guennoune + GitHub Copilot (Claude Sonnet 4.6) · MIT License*