File size: 2,661 Bytes
4b829ab
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3e85304
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
title: HF Model Ecosystem Visualizer
emoji: 🌐
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
app_port: 7860
---

# Anatomy of a Machine Learning Ecosystem: 2 Million Models on Hugging Face

**Authors:** Benjamin Laufer, Hamidah Oderinwale, Jon Kleinberg

**Research Paper**: [arXiv:2508.06811](https://arxiv.org/abs/2508.06811)

## About This Tool

This interactive visualization explores ~1.86M models from the Hugging Face ecosystem, visualizing them in a 3D embedding space where similar models appear closer together. The tool uses **chunked embeddings** for fast startup and efficient memory usage.

## Features

- **Fast Startup**: 2-5 seconds (uses chunked embeddings)
- **Low Memory**: ~100MB idle (vs 2.8GB without chunking)
- **Scalable**: Handles millions of models efficiently
- **Interactive**: Filter, search, and explore model relationships
- **Family Trees**: Visualize parent-child relationships between models

## How It Works

The system uses:
1. **Chunked Embeddings**: Pre-computed embeddings stored in chunks (50k models per chunk)
2. **On-Demand Loading**: Only loads embeddings for filtered models
3. **Pre-computed Coordinates**: UMAP coordinates stored with model metadata
4. **Fast API**: FastAPI backend with efficient data loading

## Data Source

- **Dataset**: [modelbiome/ai_ecosystem](https://huggingface.co/datasets/modelbiome/ai_ecosystem)
- **Pre-computed Data**: Automatically downloaded from `modelbiome/hf-viz-precomputed` on startup

## Deployment

This Space automatically:
1. Downloads pre-computed chunked data from Hugging Face Hub
2. Starts the FastAPI backend
3. Serves the React frontend
4. Uses chunked loading for efficient memory usage

## Performance

- **Startup**: 2-5 seconds
- **Memory**: ~100MB idle, ~200-500MB active
- **API Response**: <1s for filtered queries
- **Scales To**: Unlimited models

## Usage

1. **Filter Models**: Use the sidebar to filter by downloads, likes, search query
2. **Explore**: Zoom and pan to explore the embedding space
3. **Search**: Search for specific models or tags
4. **View Details**: Click on models to see detailed information

## Technical Details

- **Backend**: FastAPI (Python)
- **Frontend**: React + TypeScript
- **Embeddings**: SentenceTransformer (all-MiniLM-L6-v2)
- **Visualization**: UMAP (3D coordinates)
- **Storage**: Parquet files with chunked embeddings

## Resources

- **GitHub**: [bendlaufer/ai-ecosystem](https://github.com/bendlaufer/ai-ecosystem)
- **Paper**: [arXiv:2508.06811](https://arxiv.org/abs/2508.06811)
- **Dataset**: [modelbiome/ai_ecosystem](https://huggingface.co/datasets/modelbiome/ai_ecosystem)