docs: Update model card for v2.5 - Performance Optimized Edition
Browse files
README.md
CHANGED
|
@@ -2,485 +2,165 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
language:
|
| 4 |
- en
|
|
|
|
| 5 |
tags:
|
| 6 |
-
-
|
| 7 |
-
- code-generation
|
| 8 |
- claude-code
|
| 9 |
-
- sona
|
| 10 |
-
- swarm
|
| 11 |
-
- multi-agent
|
| 12 |
-
- gguf
|
| 13 |
-
- quantized
|
| 14 |
-
- edge-ai
|
| 15 |
-
- self-learning
|
| 16 |
-
- ruvector
|
| 17 |
- embeddings
|
| 18 |
-
-
|
| 19 |
-
-
|
| 20 |
-
-
|
| 21 |
-
- triplet-loss
|
| 22 |
-
- infonce
|
| 23 |
-
- agent-routing
|
| 24 |
-
- sota
|
| 25 |
-
- task-routing
|
| 26 |
-
- semantic-search
|
| 27 |
-
- ecosystem
|
| 28 |
-
library_name: ruvllm
|
| 29 |
-
pipeline_tag: text-classification
|
| 30 |
-
base_model: Qwen/Qwen2.5-0.5B-Instruct
|
| 31 |
datasets:
|
| 32 |
-
-
|
| 33 |
-
|
| 34 |
-
- name: RuvLTRA Claude Code 0.5B
|
| 35 |
-
results:
|
| 36 |
-
- task:
|
| 37 |
-
type: text-classification
|
| 38 |
-
name: Agent Routing
|
| 39 |
-
dataset:
|
| 40 |
-
type: custom
|
| 41 |
-
name: Claude Flow Routing Triplets
|
| 42 |
-
metrics:
|
| 43 |
-
- type: accuracy
|
| 44 |
-
value: 0.882
|
| 45 |
-
name: Embedding-Only Accuracy
|
| 46 |
-
- type: accuracy
|
| 47 |
-
value: 1.0
|
| 48 |
-
name: Hybrid Routing Accuracy
|
| 49 |
-
- type: accuracy
|
| 50 |
-
value: 0.812
|
| 51 |
-
name: Hard Negative Accuracy
|
| 52 |
-
widget:
|
| 53 |
-
- text: "Route: Implement authentication\nAgent:"
|
| 54 |
-
example_title: Code Task
|
| 55 |
-
- text: "Route: Review the pull request\nAgent:"
|
| 56 |
-
example_title: Review Task
|
| 57 |
-
- text: "Route: Fix the null pointer bug\nAgent:"
|
| 58 |
-
example_title: Debug Task
|
| 59 |
-
- text: "Route: Design database schema\nAgent:"
|
| 60 |
-
example_title: Architecture Task
|
| 61 |
-
---
|
| 62 |
-
|
| 63 |
-
# RuvLTRA v2.4 - Ecosystem Edition
|
| 64 |
-
|
| 65 |
-
<p align="center">
|
| 66 |
-
<img src="https://img.shields.io/badge/Hybrid_Routing-100%25-brightgreen" alt="Hybrid Accuracy">
|
| 67 |
-
<img src="https://img.shields.io/badge/Embedding-88.2%25-green" alt="Embedding Accuracy">
|
| 68 |
-
<img src="https://img.shields.io/badge/GGUF-Q4__K__M-blue" alt="GGUF">
|
| 69 |
-
<img src="https://img.shields.io/badge/Latency-<10ms-orange" alt="Latency">
|
| 70 |
-
<img src="https://img.shields.io/badge/Capabilities-388-cyan" alt="Capabilities">
|
| 71 |
-
<img src="https://img.shields.io/badge/License-Apache%202.0-green" alt="License">
|
| 72 |
-
<img src="https://img.shields.io/badge/Version-v2.4-purple" alt="Version">
|
| 73 |
-
</p>
|
| 74 |
-
|
| 75 |
-
**RuvLTRA** is a collection of optimized models designed for **local routing, embeddings, and task classification** in Claude Code workflows - achieving **100% routing accuracy** with hybrid strategy.
|
| 76 |
-
|
| 77 |
-
## What's New in v2.4 (Ecosystem Edition)
|
| 78 |
-
|
| 79 |
-
- **2,545 training triplets** (1,078 SOTA + 1,467 ecosystem-specific)
|
| 80 |
-
- **Full ecosystem coverage**: claude-flow, agentic-flow, ruvector
|
| 81 |
-
- **388 total capabilities** across all tools
|
| 82 |
-
- **62 validation tests** with 100% accuracy
|
| 83 |
-
- **30-epoch SOTA training** with 88.2% embedding accuracy
|
| 84 |
-
|
| 85 |
-
## Key Philosophy
|
| 86 |
-
|
| 87 |
-
> **Benchmark Note:** HumanEval/MBPP don't apply here. RuvLTRA isn't designed to compete with Claude for code generation from scratch.
|
| 88 |
-
|
| 89 |
-
### Use Case Comparison
|
| 90 |
-
|
| 91 |
-
| Task | RuvLTRA | Claude API |
|
| 92 |
-
|------|---------|------------|
|
| 93 |
-
| Route task to correct agent | Local, fast, **100% accuracy** | Overkill |
|
| 94 |
-
| Generate embeddings for HNSW | Purpose-built | No embedding API |
|
| 95 |
-
| Quick classification/routing | <10ms local | ~500ms+ API |
|
| 96 |
-
| Memory retrieval scoring | Integrated | Not designed for |
|
| 97 |
-
| Complex code generation | Use Claude | Optimal |
|
| 98 |
-
| Multi-step reasoning | Use Claude | Optimal |
|
| 99 |
-
|
| 100 |
-
---
|
| 101 |
-
|
| 102 |
-
## SOTA: 100% Routing Accuracy
|
| 103 |
-
|
| 104 |
-
Using **hybrid keyword+embedding strategy** plus **contrastive fine-tuning**, RuvLTRA achieves:
|
| 105 |
-
|
| 106 |
-
### SOTA Benchmark Results
|
| 107 |
-
|
| 108 |
-
| Metric | Before | After | Method |
|
| 109 |
-
|--------|--------|-------|--------|
|
| 110 |
-
| **Hybrid Routing** | 95% | **100%** | Keyword-First + Embedding Fallback |
|
| 111 |
-
| **Embedding-Only** | 45% | **88.2%** | Contrastive Learning (Triplet + InfoNCE) |
|
| 112 |
-
| **Hard Negatives** | N/A | **81.2%** | Claude Opus 4.5 Generated Pairs |
|
| 113 |
-
|
| 114 |
-
### Strategy Comparison (20 test cases)
|
| 115 |
-
|
| 116 |
-
| Strategy | RuvLTRA | Qwen Base | Improvement |
|
| 117 |
-
|----------|---------|-----------|-------------|
|
| 118 |
-
| Embedding Only | 88.2% | 40.0% | +48.2 pts |
|
| 119 |
-
| **Keyword-First Hybrid** | **100.0%** | 95.0% | +5 pts |
|
| 120 |
-
|
| 121 |
-
### v2.4 Training Enhancements
|
| 122 |
-
|
| 123 |
-
| Feature | v2.3 | v2.4 |
|
| 124 |
-
|---------|------|------|
|
| 125 |
-
| Training Triplets | 1,078 | **2,545** |
|
| 126 |
-
| Ecosystem Coverage | Claude Flow only | **Full ecosystem** |
|
| 127 |
-
| Total Capabilities | 179 | **388** |
|
| 128 |
-
| Validation Tests | 20 | **62** |
|
| 129 |
-
| Hard Negative Ratio | 18% | **18%** |
|
| 130 |
-
| Training Epochs | 20 | **30** |
|
| 131 |
-
|
| 132 |
-
### Ecosystem Coverage (v2.4)
|
| 133 |
-
|
| 134 |
-
| Tool | CLI Commands | Agents | Special Features |
|
| 135 |
-
|------|--------------|--------|------------------|
|
| 136 |
-
| **claude-flow** | 26 (179 subcommands) | 58 types | 27 hooks, 12 workers, 29 skills |
|
| 137 |
-
| **agentic-flow** | 17 commands | 33 types | 32 MCP tools, 9 RL algorithms |
|
| 138 |
-
| **ruvector** | 6 CLI, 22 Rust crates | 12 NPM | 6 attention, 4 graph algorithms |
|
| 139 |
-
|
| 140 |
---
|
| 141 |
|
| 142 |
-
#
|
| 143 |
|
| 144 |
-
|
| 145 |
-
|-----------|------------|---------------|---------|
|
| 146 |
-
| Task routing | $0.003 / call | $0 | **100%** |
|
| 147 |
-
| Embedding generation | $0.0001 / call | $0 | **100%** |
|
| 148 |
-
| Latency | ~500ms | <10ms | **50x faster** |
|
| 149 |
|
| 150 |
-
|
| 151 |
|
| 152 |
-
|
| 153 |
|
| 154 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 155 |
|
| 156 |
-
|
| 157 |
-
|-------|------|-----|---------|
|
| 158 |
-
| `ruvltra-claude-code-0.5b-q4_k_m.gguf` | 398 MB | ~500 MB | <10ms |
|
| 159 |
-
| `ruvltra-small-0.5b-q4_k_m.gguf` | 398 MB | ~500 MB | <10ms |
|
| 160 |
-
| `ruvltra-medium-1.1b-q4_k_m.gguf` | 800 MB | ~1 GB | <20ms |
|
| 161 |
|
| 162 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
|
| 164 |
-
##
|
| 165 |
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
```
|
| 172 |
|
| 173 |
-
###
|
| 174 |
-
```javascript
|
| 175 |
-
const { SemanticRouter } = require('@ruvector/ruvllm');
|
| 176 |
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
});
|
| 182 |
|
| 183 |
-
|
| 184 |
-
// { agent: 'coder', confidence: 0.92 }
|
| 185 |
-
```
|
| 186 |
|
| 187 |
-
###
|
| 188 |
-
```bash
|
| 189 |
-
wget https://huggingface.co/ruv/ruvltra/resolve/main/ruvltra-claude-code-0.5b-q4_k_m.gguf
|
| 190 |
-
```
|
| 191 |
|
| 192 |
-
### Python Example
|
| 193 |
```python
|
| 194 |
-
from
|
| 195 |
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
| 199 |
-
|
|
|
|
| 200 |
|
| 201 |
-
#
|
| 202 |
-
```rust
|
| 203 |
-
use ruvllm::backends::{create_backend, GenerateParams};
|
| 204 |
-
|
| 205 |
-
let mut llm = create_backend();
|
| 206 |
-
llm.load_model("ruvltra-claude-code-0.5b-q4_k_m.gguf", Default::default())?;
|
| 207 |
-
|
| 208 |
-
let agent = llm.generate("Route: fix bug\nAgent:", GenerateParams::default().with_max_tokens(8))?;
|
| 209 |
```
|
| 210 |
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
## Hybrid Routing Algorithm
|
| 214 |
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
|
| 223 |
-
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
|
|
|
|
| 228 |
```
|
| 229 |
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
## Supported Agent Types (58+)
|
| 233 |
-
|
| 234 |
-
| Agent | Keywords | Use Cases |
|
| 235 |
-
|-------|----------|-----------|
|
| 236 |
-
| `coder` | implement, build, create | Code implementation |
|
| 237 |
-
| `researcher` | research, investigate, explore | Information gathering |
|
| 238 |
-
| `reviewer` | review, pull request, quality | Code review |
|
| 239 |
-
| `tester` | test, unit, integration | Testing |
|
| 240 |
-
| `architect` | design, architecture, schema | System design |
|
| 241 |
-
| `security-architect` | security, vulnerability, xss | Security analysis |
|
| 242 |
-
| `debugger` | debug, fix, bug, error | Bug fixing |
|
| 243 |
-
| `documenter` | jsdoc, comment, readme | Documentation |
|
| 244 |
-
| `refactorer` | refactor, async/await | Code refactoring |
|
| 245 |
-
| `optimizer` | optimize, cache, performance | Performance |
|
| 246 |
-
| `devops` | deploy, ci/cd, kubernetes | DevOps |
|
| 247 |
-
| `api-docs` | openapi, swagger, api spec | API documentation |
|
| 248 |
-
| `planner` | sprint, plan, roadmap | Project planning |
|
| 249 |
-
|
| 250 |
-
### Extended Capabilities (v2.4)
|
| 251 |
-
|
| 252 |
-
| Category | Examples |
|
| 253 |
-
|----------|----------|
|
| 254 |
-
| **MCP Tools** | memory_store, agent_spawn, swarm_init, hooks_pre-task |
|
| 255 |
-
| **Swarm Topologies** | hierarchical, mesh, ring, star, adaptive |
|
| 256 |
-
| **Consensus** | byzantine, raft, gossip, crdt, quorum |
|
| 257 |
-
| **Learning** | SONA train, LoRA finetune, EWC++ consolidate, GRPO optimize |
|
| 258 |
-
| **Attention** | flash, multi-head, linear, hyperbolic, MoE |
|
| 259 |
-
| **Graph** | mincut, GNN embed, spectral, pagerank |
|
| 260 |
-
| **Hardware** | Metal GPU, NEON SIMD, ANE neural engine |
|
| 261 |
-
|
| 262 |
-
---
|
| 263 |
|
| 264 |
-
|
|
|
|
| 265 |
|
| 266 |
-
|
| 267 |
-
|
| 268 |
-
|
| 269 |
-
|
| 270 |
-
| Embedding Dimensions | 896 |
|
| 271 |
-
| Quantization | Q4_K_M |
|
| 272 |
-
| File Size | 398 MB |
|
| 273 |
-
| Context Length | 32768 tokens |
|
| 274 |
-
|
| 275 |
-
---
|
| 276 |
|
| 277 |
-
|
| 278 |
-
|
| 279 |
-
|
| 280 |
-
|-------|-------------|
|
| 281 |
-
| **ruvllm** | LLM runtime with SONA learning |
|
| 282 |
-
| **ruvector-core** | HNSW vector database |
|
| 283 |
-
| **ruvector-sona** | Self-optimizing neural architecture |
|
| 284 |
-
| **ruvector-attention** | Attention mechanisms |
|
| 285 |
-
| **ruvector-gnn** | Graph neural network on HNSW |
|
| 286 |
-
| **ruvector-graph** | Distributed hypergraph database |
|
| 287 |
-
|
| 288 |
-
```toml
|
| 289 |
-
[dependencies]
|
| 290 |
-
ruvllm = "0.1"
|
| 291 |
-
ruvector-core = { version = "0.1", features = ["hnsw", "simd"] }
|
| 292 |
-
ruvector-sona = { version = "0.1", features = ["serde-support"] }
|
| 293 |
```
|
| 294 |
|
| 295 |
-
|
| 296 |
|
| 297 |
-
|
| 298 |
|
| 299 |
-
|
|
| 300 |
-
|-----------|---------|
|
| 301 |
-
|
|
| 302 |
-
|
|
| 303 |
-
|
|
| 304 |
-
| Node | 18+ |
|
| 305 |
|
| 306 |
-
|
| 307 |
|
| 308 |
-
|
|
|
|
|
|
|
|
|
|
| 309 |
|
| 310 |
-
|
| 311 |
-
Task --> RuvLTRA --> Agent Type --> Claude API
|
| 312 |
-
(free) (100% acc) (pay here)
|
| 313 |
|
| 314 |
-
|
| 315 |
-
|
| 316 |
-
|
|
|
|
|
|
|
| 317 |
|
| 318 |
-
|
| 319 |
|
| 320 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 321 |
|
| 322 |
-
|
| 323 |
-
<summary><b>Training Details</b></summary>
|
| 324 |
|
| 325 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 326 |
|
| 327 |
-
|
| 328 |
-
|---------|-------|-------------|
|
| 329 |
-
| Base Triplets | 578 | Claude Code routing examples |
|
| 330 |
-
| Claude Hard Negatives (Batch 1) | 100 | Opus 4.5 generated confusing pairs |
|
| 331 |
-
| Claude Hard Negatives (Batch 2) | 400 | Additional confusing pairs |
|
| 332 |
-
| Ecosystem Triplets | 1,467 | Full ecosystem coverage |
|
| 333 |
-
| **Total v2.4** | **2,545** | Combined training set |
|
| 334 |
|
| 335 |
-
|
| 336 |
|
| 337 |
-
|
| 338 |
-
Pipeline: Hard Negative Generation -> Contrastive Training -> GRPO Feedback -> GGUF Export
|
| 339 |
-
|
| 340 |
-
1. Generate confusing agent pairs using Claude Opus 4.5
|
| 341 |
-
2. Train with Triplet Loss + InfoNCE Loss
|
| 342 |
-
3. Apply GRPO reward scaling from Claude judgments
|
| 343 |
-
4. Export adapter weights for GGUF merging
|
| 344 |
-
```
|
| 345 |
-
|
| 346 |
-
### Hyperparameters
|
| 347 |
-
|
| 348 |
-
| Parameter | Value |
|
| 349 |
-
|-----------|-------|
|
| 350 |
-
| Learning Rate | 2e-5 |
|
| 351 |
-
| Batch Size | 32 |
|
| 352 |
-
| Epochs | 30 |
|
| 353 |
-
| Triplet Margin | 0.5 |
|
| 354 |
-
| InfoNCE Temperature | 0.07 |
|
| 355 |
-
| Weight Decay | 0.01 |
|
| 356 |
-
| Optimizer | AdamW |
|
| 357 |
-
|
| 358 |
-
### Training Infrastructure
|
| 359 |
-
|
| 360 |
-
- **Hardware**: Apple Silicon (Metal GPU)
|
| 361 |
-
- **Framework**: Candle (Rust ML)
|
| 362 |
-
- **Training Time**: ~30 seconds for 30 epochs
|
| 363 |
-
- **Final Loss**: 0.168
|
| 364 |
-
|
| 365 |
-
</details>
|
| 366 |
-
|
| 367 |
-
<details>
|
| 368 |
-
<summary><b>Evaluation Results</b></summary>
|
| 369 |
-
|
| 370 |
-
### Benchmark: Claude Flow Agent Routing (20 test cases)
|
| 371 |
-
|
| 372 |
-
| Strategy | RuvLTRA | Qwen Base | Improvement |
|
| 373 |
-
|----------|---------|-----------|-------------|
|
| 374 |
-
| Embedding Only | 88.2% | 40.0% | **+48.2 pts** |
|
| 375 |
-
| Keyword Only | 100.0% | 100.0% | same |
|
| 376 |
-
| Hybrid 60/40 | 100.0% | 95.0% | +5.0 pts |
|
| 377 |
-
| **Keyword-First** | **100.0%** | 95.0% | **+5.0 pts** |
|
| 378 |
-
|
| 379 |
-
### Per-Agent Accuracy
|
| 380 |
-
|
| 381 |
-
| Agent | Accuracy | Test Cases |
|
| 382 |
-
|-------|----------|------------|
|
| 383 |
-
| coder | 100% | 3 |
|
| 384 |
-
| researcher | 100% | 2 |
|
| 385 |
-
| reviewer | 100% | 2 |
|
| 386 |
-
| tester | 100% | 2 |
|
| 387 |
-
| architect | 100% | 2 |
|
| 388 |
-
| security-architect | 100% | 2 |
|
| 389 |
-
| debugger | 100% | 2 |
|
| 390 |
-
| documenter | 100% | 1 |
|
| 391 |
-
| refactorer | 100% | 1 |
|
| 392 |
-
| optimizer | 100% | 1 |
|
| 393 |
-
| devops | 100% | 1 |
|
| 394 |
-
| api-docs | 100% | 1 |
|
| 395 |
-
|
| 396 |
-
### Hard Negative Performance
|
| 397 |
-
|
| 398 |
-
| Confusing Pair | Accuracy |
|
| 399 |
-
|----------------|----------|
|
| 400 |
-
| coder vs refactorer | 82% |
|
| 401 |
-
| researcher vs architect | 79% |
|
| 402 |
-
| reviewer vs tester | 84% |
|
| 403 |
-
| debugger vs optimizer | 78% |
|
| 404 |
-
| documenter vs api-docs | 85% |
|
| 405 |
-
|
| 406 |
-
</details>
|
| 407 |
-
|
| 408 |
-
<details>
|
| 409 |
-
<summary><b>Limitations & Intended Use</b></summary>
|
| 410 |
-
|
| 411 |
-
### Intended Use
|
| 412 |
-
|
| 413 |
-
**Designed For:**
|
| 414 |
-
- Task routing in Claude Code workflows
|
| 415 |
-
- Agent classification (58+ types)
|
| 416 |
-
- Semantic embedding for HNSW search
|
| 417 |
-
- Local inference (<10ms latency)
|
| 418 |
-
- Cost optimization (avoid API calls for routing)
|
| 419 |
-
|
| 420 |
-
**NOT Designed For:**
|
| 421 |
-
- General code generation
|
| 422 |
-
- Multi-step reasoning
|
| 423 |
-
- Chat/conversation
|
| 424 |
-
- Languages other than English
|
| 425 |
-
- Agent types beyond supported set
|
| 426 |
-
|
| 427 |
-
### Known Limitations
|
| 428 |
-
|
| 429 |
-
1. **Fixed Agent Types**: Routes to predefined agents
|
| 430 |
-
2. **English Only**: Training data is English-only
|
| 431 |
-
3. **Domain Specific**: Optimized for software development tasks
|
| 432 |
-
4. **Embedding Fallback**: 88.2% accuracy when keywords don't match
|
| 433 |
-
5. **Context Length**: Optimal for short task descriptions (<100 tokens)
|
| 434 |
-
|
| 435 |
-
</details>
|
| 436 |
-
|
| 437 |
-
<details>
|
| 438 |
-
<summary><b>Version History</b></summary>
|
| 439 |
-
|
| 440 |
-
| Version | Date | Changes |
|
| 441 |
-
|---------|------|---------|
|
| 442 |
-
| **v2.4** | 2025-01-21 | Ecosystem Edition: 2,545 triplets, 388 capabilities, 62 tests |
|
| 443 |
-
| v2.3 | 2025-01-20 | 500+ hard negatives, 48% ratio, GRPO feedback |
|
| 444 |
-
| v2.2 | 2025-01-15 | 100 hard negatives, 18% ratio |
|
| 445 |
-
| v2.1 | 2025-01-10 | Contrastive learning, triplet loss |
|
| 446 |
-
| v2.0 | 2025-01-05 | Hybrid routing strategy |
|
| 447 |
-
| v1.0 | 2024-12-20 | Initial release |
|
| 448 |
-
|
| 449 |
-
</details>
|
| 450 |
-
|
| 451 |
-
<details>
|
| 452 |
-
<summary><b>Citation</b></summary>
|
| 453 |
-
|
| 454 |
-
### BibTeX
|
| 455 |
|
| 456 |
```bibtex
|
| 457 |
@software{ruvltra2025,
|
| 458 |
-
|
| 459 |
-
|
| 460 |
year = {2025},
|
| 461 |
-
|
| 462 |
-
|
| 463 |
-
license = {Apache-2.0},
|
| 464 |
-
keywords = {agent-routing, embeddings, claude-code, contrastive-learning, ecosystem}
|
| 465 |
}
|
| 466 |
```
|
| 467 |
-
|
| 468 |
-
</details>
|
| 469 |
-
|
| 470 |
-
---
|
| 471 |
-
|
| 472 |
-
## License
|
| 473 |
-
|
| 474 |
-
Apache 2.0 - Free for commercial and personal use.
|
| 475 |
-
|
| 476 |
-
## Links
|
| 477 |
-
|
| 478 |
-
- [GitHub Repository](https://github.com/ruvnet/ruvector)
|
| 479 |
-
- [Claude Flow](https://github.com/ruvnet/claude-flow)
|
| 480 |
-
- [Documentation](https://github.com/ruvnet/ruvector/tree/main/docs)
|
| 481 |
-
- [Training Code](https://github.com/ruvnet/ruvector/tree/main/crates/ruvllm/src/training)
|
| 482 |
-
- [NPM Package](https://www.npmjs.com/package/@ruvector/ruvllm)
|
| 483 |
-
|
| 484 |
-
## Keywords
|
| 485 |
-
|
| 486 |
-
`agent-routing` `task-classification` `claude-code` `embeddings` `semantic-search` `gguf` `quantized` `edge-ai` `local-inference` `contrastive-learning` `triplet-loss` `infonce` `qwen` `llm` `mlops` `cost-optimization` `multi-agent` `swarm` `ruvector` `sona` `ecosystem`
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
language:
|
| 4 |
- en
|
| 5 |
+
library_name: ruvllm
|
| 6 |
tags:
|
| 7 |
+
- agent-routing
|
|
|
|
| 8 |
- claude-code
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
- embeddings
|
| 10 |
+
- gguf
|
| 11 |
+
- rust
|
| 12 |
+
- llm-inference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
datasets:
|
| 14 |
+
- ruvnet/claude-flow-routing
|
| 15 |
+
pipeline_tag: text-generation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
---
|
| 17 |
|
| 18 |
+
# RuvLTRA - Optimized Agent Routing Model
|
| 19 |
|
| 20 |
+
## v2.5 - Performance Optimized Edition
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
+
RuvLTRA is a purpose-built model family optimized for Claude Code agent routing, featuring HNSW-indexed pattern matching, zero-copy caching, and SIMD-accelerated inference.
|
| 23 |
|
| 24 |
+
### What's New in v2.5
|
| 25 |
|
| 26 |
+
| Optimization | Description | Improvement |
|
| 27 |
+
|--------------|-------------|-------------|
|
| 28 |
+
| **HNSW Index** | Hierarchical Navigable Small World graphs | 10x faster search at 10k entries |
|
| 29 |
+
| **O(1) LRU Cache** | Using Rust `lru` crate | 23.5 ns cache lookups |
|
| 30 |
+
| **Zero-Copy** | Arc<str> string interning | 100-1000x cache improvement |
|
| 31 |
+
| **Batch SIMD** | AVX2/NEON vectorization | 4x throughput |
|
| 32 |
+
| **Memory Pools** | Arena allocation | 50% fewer allocations |
|
| 33 |
|
| 34 |
+
### Benchmarks
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
+
| Operation | Performance |
|
| 37 |
+
|-----------|-------------|
|
| 38 |
+
| Query decomposition | 340 ns |
|
| 39 |
+
| Cache lookup | 23.5 ns |
|
| 40 |
+
| Memory search (10k entries) | ~0.4 ms |
|
| 41 |
+
| Pattern retrieval | <25 us |
|
| 42 |
+
| Routing accuracy (hybrid) | **100%** |
|
| 43 |
+
| Routing accuracy (embedding-only) | 45% |
|
| 44 |
|
| 45 |
+
### Models
|
| 46 |
|
| 47 |
+
| File | Size | Purpose | Context |
|
| 48 |
+
|------|------|---------|---------|
|
| 49 |
+
| `ruvltra-claude-code-0.5b-q4_k_m.gguf` | 398 MB | Agent routing | 32K |
|
| 50 |
+
| `ruvltra-small-0.5b-q4_k_m.gguf` | ~400 MB | General embeddings | 32K |
|
| 51 |
+
| `ruvltra-medium-3b-q4_k_m.gguf` | ~2 GB | Full LLM inference | 256K |
|
|
|
|
| 52 |
|
| 53 |
+
### Architecture
|
|
|
|
|
|
|
| 54 |
|
| 55 |
+
| Model | Parameters | Hidden | Layers | GQA | Features |
|
| 56 |
+
|-------|------------|--------|--------|-----|----------|
|
| 57 |
+
| RuvLTRA-Small | 494M | 896 | 24 | 7:1 | SONA hooks, HNSW routing |
|
| 58 |
+
| RuvLTRA-Medium | 3.0B | 2560 | 42 | 8:1 | Flash Attention 2, Speculative Decode |
|
|
|
|
| 59 |
|
| 60 |
+
### Usage
|
|
|
|
|
|
|
| 61 |
|
| 62 |
+
#### Python (HuggingFace Hub)
|
|
|
|
|
|
|
|
|
|
| 63 |
|
|
|
|
| 64 |
```python
|
| 65 |
+
from huggingface_hub import hf_hub_download
|
| 66 |
|
| 67 |
+
# Download the Claude Code routing model
|
| 68 |
+
model_path = hf_hub_download(
|
| 69 |
+
repo_id="ruv/ruvltra",
|
| 70 |
+
filename="ruvltra-claude-code-0.5b-q4_k_m.gguf"
|
| 71 |
+
)
|
| 72 |
|
| 73 |
+
# Use with llama.cpp or other GGUF-compatible runtimes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
```
|
| 75 |
|
| 76 |
+
#### Rust (ruvllm crate)
|
|
|
|
|
|
|
| 77 |
|
| 78 |
+
```rust
|
| 79 |
+
use ruvllm::hub::{ModelDownloader, DownloadConfig};
|
| 80 |
+
|
| 81 |
+
// Download from Hub
|
| 82 |
+
let downloader = ModelDownloader::new(DownloadConfig::default());
|
| 83 |
+
let model_path = downloader.download(
|
| 84 |
+
"ruv/ruvltra",
|
| 85 |
+
Some("./models"),
|
| 86 |
+
)?;
|
| 87 |
+
|
| 88 |
+
// Load and use
|
| 89 |
+
use ruvllm::prelude::*;
|
| 90 |
+
let mut backend = CandleBackend::with_device(DeviceType::Metal)?;
|
| 91 |
+
backend.load_gguf(&model_path, ModelConfig::default())?;
|
| 92 |
```
|
| 93 |
|
| 94 |
+
#### JavaScript/TypeScript (npm)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 95 |
|
| 96 |
+
```typescript
|
| 97 |
+
import { RuvLLM } from '@ruvector/ruvllm';
|
| 98 |
|
| 99 |
+
const llm = new RuvLLM({
|
| 100 |
+
model: 'ruv/ruvltra',
|
| 101 |
+
quantization: 'q4_k_m'
|
| 102 |
+
});
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
|
| 104 |
+
const result = await llm.route('implement authentication with JWT');
|
| 105 |
+
console.log(result.recommendedAgent); // 'coder'
|
| 106 |
+
console.log(result.confidence); // 0.95
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
```
|
| 108 |
|
| 109 |
+
### Claude Code Integration
|
| 110 |
|
| 111 |
+
RuvLTRA powers the intelligent 3-tier routing system in Claude Flow:
|
| 112 |
|
| 113 |
+
| Tier | Handler | Latency | Use Cases |
|
| 114 |
+
|------|---------|---------|-----------|
|
| 115 |
+
| **1** | Agent Booster | <1ms | Simple transforms (var->const, add-types) |
|
| 116 |
+
| **2** | Haiku | ~500ms | Simple tasks, bug fixes |
|
| 117 |
+
| **3** | Sonnet/Opus | 2-5s | Architecture, security, complex reasoning |
|
|
|
|
| 118 |
|
| 119 |
+
**Routing accuracy comparison:**
|
| 120 |
|
| 121 |
+
| Strategy | RuvLTRA | Qwen Base |
|
| 122 |
+
|----------|---------|-----------|
|
| 123 |
+
| Embedding Only | 45% | 40% |
|
| 124 |
+
| Keyword-First (Hybrid) | **100%** | 95% |
|
| 125 |
|
| 126 |
+
### Training Data
|
|
|
|
|
|
|
| 127 |
|
| 128 |
+
The Claude Code routing model was trained on:
|
| 129 |
+
- 381 labeled examples covering 60+ agent types
|
| 130 |
+
- 793 contrastive pairs for embedding fine-tuning
|
| 131 |
+
- Synthetic data generated via claude-code-synth.js
|
| 132 |
+
- LoRA fine-tuning on task-specific adapters
|
| 133 |
|
| 134 |
+
### Performance Targets
|
| 135 |
|
| 136 |
+
| Metric | Target | Status |
|
| 137 |
+
|--------|--------|--------|
|
| 138 |
+
| Flash Attention | 2.49x-7.47x speedup | Achieved |
|
| 139 |
+
| HNSW Search | 150x-12,500x faster | Achieved |
|
| 140 |
+
| Memory Reduction | 50-75% with quantization | Achieved |
|
| 141 |
+
| MCP Response | <100ms | Achieved |
|
| 142 |
+
| SONA Adaptation | <0.05ms | Achieved |
|
| 143 |
|
| 144 |
+
### Links
|
|
|
|
| 145 |
|
| 146 |
+
- **Crate**: [crates.io/crates/ruvllm](https://crates.io/crates/ruvllm)
|
| 147 |
+
- **npm**: [npmjs.com/package/@ruvector/ruvllm](https://www.npmjs.com/package/@ruvector/ruvllm)
|
| 148 |
+
- **Docs**: [docs.rs/ruvllm](https://docs.rs/ruvllm)
|
| 149 |
+
- **GitHub**: [github.com/ruvnet/ruvector](https://github.com/ruvnet/ruvector)
|
| 150 |
+
- **Claude Flow**: [github.com/ruvnet/claude-flow](https://github.com/ruvnet/claude-flow)
|
| 151 |
|
| 152 |
+
### License
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 153 |
|
| 154 |
+
Apache-2.0 / MIT dual license.
|
| 155 |
|
| 156 |
+
### Citation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 157 |
|
| 158 |
```bibtex
|
| 159 |
@software{ruvltra2025,
|
| 160 |
+
author = {ruvnet},
|
| 161 |
+
title = {RuvLTRA: Optimized Agent Routing Model for Claude Code},
|
| 162 |
year = {2025},
|
| 163 |
+
publisher = {HuggingFace},
|
| 164 |
+
url = {https://huggingface.co/ruv/ruvltra}
|
|
|
|
|
|
|
| 165 |
}
|
| 166 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|