| --- |
| title: HF Model Ecosystem Visualizer |
| emoji: ๐ |
| colorFrom: blue |
| colorTo: purple |
| sdk: docker |
| pinned: false |
| license: mit |
| app_port: 7860 |
| --- |
| |
| # Anatomy of a Machine Learning Ecosystem: 2 Million Models on Hugging Face |
|
|
| **Authors:** Benjamin Laufer, Hamidah Oderinwale, Jon Kleinberg |
|
|
| **Research Paper**: [arXiv:2508.06811](https://arxiv.org/abs/2508.06811) |
|
|
| ## About This Tool |
|
|
| This interactive visualization explores ~1.86M models from the Hugging Face ecosystem, visualizing them in a 3D embedding space where similar models appear closer together. The tool uses **chunked embeddings** for fast startup and efficient memory usage. |
|
|
| ## Features |
|
|
| - **Fast Startup**: 2-5 seconds (uses chunked embeddings) |
| - **Low Memory**: ~100MB idle (vs 2.8GB without chunking) |
| - **Scalable**: Handles millions of models efficiently |
| - **Interactive**: Filter, search, and explore model relationships |
| - **Family Trees**: Visualize parent-child relationships between models |
|
|
| ## How It Works |
|
|
| The system uses: |
| 1. **Chunked Embeddings**: Pre-computed embeddings stored in chunks (50k models per chunk) |
| 2. **On-Demand Loading**: Only loads embeddings for filtered models |
| 3. **Pre-computed Coordinates**: UMAP coordinates stored with model metadata |
| 4. **Fast API**: FastAPI backend with efficient data loading |
|
|
| ## Data Source |
|
|
| - **Dataset**: [modelbiome/ai_ecosystem](https://huggingface.co/datasets/modelbiome/ai_ecosystem) |
| - **Pre-computed Data**: Automatically downloaded from `modelbiome/hf-viz-precomputed` on startup |
|
|
| ## Deployment |
|
|
| This Space automatically: |
| 1. Downloads pre-computed chunked data from Hugging Face Hub |
| 2. Starts the FastAPI backend |
| 3. Serves the React frontend |
| 4. Uses chunked loading for efficient memory usage |
|
|
| ## Performance |
|
|
| - **Startup**: 2-5 seconds |
| - **Memory**: ~100MB idle, ~200-500MB active |
| - **API Response**: <1s for filtered queries |
| - **Scales To**: Unlimited models |
|
|
| ## Usage |
|
|
| 1. **Filter Models**: Use the sidebar to filter by downloads, likes, search query |
| 2. **Explore**: Zoom and pan to explore the embedding space |
| 3. **Search**: Search for specific models or tags |
| 4. **View Details**: Click on models to see detailed information |
|
|
| ## Technical Details |
|
|
| - **Backend**: FastAPI (Python) |
| - **Frontend**: React + TypeScript |
| - **Embeddings**: SentenceTransformer (all-MiniLM-L6-v2) |
| - **Visualization**: UMAP (3D coordinates) |
| - **Storage**: Parquet files with chunked embeddings |
|
|
| ## Resources |
|
|
| - **GitHub**: [bendlaufer/ai-ecosystem](https://github.com/bendlaufer/ai-ecosystem) |
| - **Paper**: [arXiv:2508.06811](https://arxiv.org/abs/2508.06811) |
| - **Dataset**: [modelbiome/ai_ecosystem](https://huggingface.co/datasets/modelbiome/ai_ecosystem) |
|
|
|
|
|
|