docs: Update model card for v2.5 - Performance Optimized Edition

Browse files

Files changed (1) hide show

README.md +111 -431

README.md CHANGED Viewed

@@ -2,485 +2,165 @@
 license: apache-2.0
 language:
 - en
 tags:
-- llm
-- code-generation
 - claude-code
-- sona
-- swarm
-- multi-agent
-- gguf
-- quantized
-- edge-ai
-- self-learning
-- ruvector
 - embeddings
-- routing
-- cost-optimization
-- contrastive-learning
-- triplet-loss
-- infonce
-- agent-routing
-- sota
-- task-routing
-- semantic-search
-- ecosystem
-library_name: ruvllm
-pipeline_tag: text-classification
-base_model: Qwen/Qwen2.5-0.5B-Instruct
 datasets:
-- custom
-model-index:
-- name: RuvLTRA Claude Code 0.5B
-  results:
-  - task:
-      type: text-classification
-      name: Agent Routing
-    dataset:
-      type: custom
-      name: Claude Flow Routing Triplets
-    metrics:
-    - type: accuracy
-      value: 0.882
-      name: Embedding-Only Accuracy
-    - type: accuracy
-      value: 1.0
-      name: Hybrid Routing Accuracy
-    - type: accuracy
-      value: 0.812
-      name: Hard Negative Accuracy
-widget:
-- text: "Route: Implement authentication\nAgent:"
-  example_title: Code Task
-- text: "Route: Review the pull request\nAgent:"
-  example_title: Review Task
-- text: "Route: Fix the null pointer bug\nAgent:"
-  example_title: Debug Task
-- text: "Route: Design database schema\nAgent:"
-  example_title: Architecture Task
----
-# RuvLTRA v2.4 - Ecosystem Edition
-<p align="center">
-  <img src="https://img.shields.io/badge/Hybrid_Routing-100%25-brightgreen" alt="Hybrid Accuracy">
-  <img src="https://img.shields.io/badge/Embedding-88.2%25-green" alt="Embedding Accuracy">
-  <img src="https://img.shields.io/badge/GGUF-Q4__K__M-blue" alt="GGUF">
-  <img src="https://img.shields.io/badge/Latency-<10ms-orange" alt="Latency">
-  <img src="https://img.shields.io/badge/Capabilities-388-cyan" alt="Capabilities">
-  <img src="https://img.shields.io/badge/License-Apache%202.0-green" alt="License">
-  <img src="https://img.shields.io/badge/Version-v2.4-purple" alt="Version">
-</p>
-**RuvLTRA** is a collection of optimized models designed for **local routing, embeddings, and task classification** in Claude Code workflows - achieving **100% routing accuracy** with hybrid strategy.
-## What's New in v2.4 (Ecosystem Edition)
-- **2,545 training triplets** (1,078 SOTA + 1,467 ecosystem-specific)
-- **Full ecosystem coverage**: claude-flow, agentic-flow, ruvector
-- **388 total capabilities** across all tools
-- **62 validation tests** with 100% accuracy
-- **30-epoch SOTA training** with 88.2% embedding accuracy
-## Key Philosophy
-> **Benchmark Note:** HumanEval/MBPP don't apply here. RuvLTRA isn't designed to compete with Claude for code generation from scratch.
-### Use Case Comparison
-| Task | RuvLTRA | Claude API |
-|------|---------|------------|
-| Route task to correct agent | Local, fast, **100% accuracy** | Overkill |
-| Generate embeddings for HNSW | Purpose-built | No embedding API |
-| Quick classification/routing | <10ms local | ~500ms+ API |
-| Memory retrieval scoring | Integrated | Not designed for |
-| Complex code generation | Use Claude | Optimal |
-| Multi-step reasoning | Use Claude | Optimal |
----
-## SOTA: 100% Routing Accuracy
-Using **hybrid keyword+embedding strategy** plus **contrastive fine-tuning**, RuvLTRA achieves:
-### SOTA Benchmark Results
-| Metric | Before | After | Method |
-|--------|--------|-------|--------|
-| **Hybrid Routing** | 95% | **100%** | Keyword-First + Embedding Fallback |
-| **Embedding-Only** | 45% | **88.2%** | Contrastive Learning (Triplet + InfoNCE) |
-| **Hard Negatives** | N/A | **81.2%** | Claude Opus 4.5 Generated Pairs |
-### Strategy Comparison (20 test cases)
-| Strategy | RuvLTRA | Qwen Base | Improvement |
-|----------|---------|-----------|-------------|
-| Embedding Only | 88.2% | 40.0% | +48.2 pts |
-| **Keyword-First Hybrid** | **100.0%** | 95.0% | +5 pts |
-### v2.4 Training Enhancements
-| Feature | v2.3 | v2.4 |
-|---------|------|------|
-| Training Triplets | 1,078 | **2,545** |
-| Ecosystem Coverage | Claude Flow only | **Full ecosystem** |
-| Total Capabilities | 179 | **388** |
-| Validation Tests | 20 | **62** |
-| Hard Negative Ratio | 18% | **18%** |
-| Training Epochs | 20 | **30** |
-### Ecosystem Coverage (v2.4)
-| Tool | CLI Commands | Agents | Special Features |
-|------|--------------|--------|------------------|
-| **claude-flow** | 26 (179 subcommands) | 58 types | 27 hooks, 12 workers, 29 skills |
-| **agentic-flow** | 17 commands | 33 types | 32 MCP tools, 9 RL algorithms |
-| **ruvector** | 6 CLI, 22 Rust crates | 12 NPM | 6 attention, 4 graph algorithms |
 ---
-## Cost Savings
-| Operation | Claude API | RuvLTRA Local | Savings |
-|-----------|------------|---------------|---------|
-| Task routing | $0.003 / call | $0 | **100%** |
-| Embedding generation | $0.0001 / call | $0 | **100%** |
-| Latency | ~500ms | <10ms | **50x faster** |
-**Monthly example:** ~$250/month savings (50K routing calls + 100K embeddings)
----
-## Available Models
-| Model | Size | RAM | Latency |
-|-------|------|-----|---------|
-| `ruvltra-claude-code-0.5b-q4_k_m.gguf` | 398 MB | ~500 MB | <10ms |
-| `ruvltra-small-0.5b-q4_k_m.gguf` | 398 MB | ~500 MB | <10ms |
-| `ruvltra-medium-1.1b-q4_k_m.gguf` | 800 MB | ~1 GB | <20ms |
----
-## Quick Start
-### Installation
-```bash
-npm install @ruvector/ruvllm
-# or
-npx ruvector install
-```
-### Automatic Model Download
-```javascript
-const { SemanticRouter } = require('@ruvector/ruvllm');
-// Automatically downloads from HuggingFace if not cached
-const router = new SemanticRouter({
-  model: 'ruvltra-claude-code-0.5b',  // Auto-downloads
-  strategy: 'keyword-first'
-});
-const result = await router.route('Implement authentication system');
-// { agent: 'coder', confidence: 0.92 }
-```
-### Manual Download
-```bash
-wget https://huggingface.co/ruv/ruvltra/resolve/main/ruvltra-claude-code-0.5b-q4_k_m.gguf
-```
-### Python Example
 ```python
-from llama_cpp import Llama
-router = Llama(model_path="ruvltra-claude-code-0.5b-q4_k_m.gguf", n_ctx=512)
-result = router("Route: Add validation\nAgent:", max_tokens=8)
-print(result['choices'][0]['text'])  # -> "coder"
-```
-### Rust Example
-```rust
-use ruvllm::backends::{create_backend, GenerateParams};
-let mut llm = create_backend();
-llm.load_model("ruvltra-claude-code-0.5b-q4_k_m.gguf", Default::default())?;
-let agent = llm.generate("Route: fix bug\nAgent:", GenerateParams::default().with_max_tokens(8))?;
 ```
----
-## Hybrid Routing Algorithm
-The model achieves 100% accuracy using a two-stage routing strategy:
-```
-1. KEYWORD MATCHING (Primary)
-   - Check task for trigger keywords
-   - Priority ordering resolves conflicts
-   - "investigate" -> researcher (priority)
-   - "optimize queries" -> optimizer
-2. EMBEDDING FALLBACK (Secondary)
-   - If no keywords match, use embeddings
-   - Compare task embedding vs agent descriptions
-   - Cosine similarity for ranking
 ```
----
-## Supported Agent Types (58+)
-| Agent | Keywords | Use Cases |
-|-------|----------|-----------|
-| `coder` | implement, build, create | Code implementation |
-| `researcher` | research, investigate, explore | Information gathering |
-| `reviewer` | review, pull request, quality | Code review |
-| `tester` | test, unit, integration | Testing |
-| `architect` | design, architecture, schema | System design |
-| `security-architect` | security, vulnerability, xss | Security analysis |
-| `debugger` | debug, fix, bug, error | Bug fixing |
-| `documenter` | jsdoc, comment, readme | Documentation |
-| `refactorer` | refactor, async/await | Code refactoring |
-| `optimizer` | optimize, cache, performance | Performance |
-| `devops` | deploy, ci/cd, kubernetes | DevOps |
-| `api-docs` | openapi, swagger, api spec | API documentation |
-| `planner` | sprint, plan, roadmap | Project planning |
-### Extended Capabilities (v2.4)
-| Category | Examples |
-|----------|----------|
-| **MCP Tools** | memory_store, agent_spawn, swarm_init, hooks_pre-task |
-| **Swarm Topologies** | hierarchical, mesh, ring, star, adaptive |
-| **Consensus** | byzantine, raft, gossip, crdt, quorum |
-| **Learning** | SONA train, LoRA finetune, EWC++ consolidate, GRPO optimize |
-| **Attention** | flash, multi-head, linear, hyperbolic, MoE |
-| **Graph** | mincut, GNN embed, spectral, pagerank |
-| **Hardware** | Metal GPU, NEON SIMD, ANE neural engine |
----
-## Technical Specifications
-| Specification | Value |
-|--------------|-------|
-| Base Model | Qwen2.5-0.5B-Instruct |
-| Parameters | 494M |
-| Embedding Dimensions | 896 |
-| Quantization | Q4_K_M |
-| File Size | 398 MB |
-| Context Length | 32768 tokens |
----
-## Rust Crates
-| Crate | Description |
-|-------|-------------|
-| **ruvllm** | LLM runtime with SONA learning |
-| **ruvector-core** | HNSW vector database |
-| **ruvector-sona** | Self-optimizing neural architecture |
-| **ruvector-attention** | Attention mechanisms |
-| **ruvector-gnn** | Graph neural network on HNSW |
-| **ruvector-graph** | Distributed hypergraph database |
-```toml
-[dependencies]
-ruvllm = "0.1"
-ruvector-core = { version = "0.1", features = ["hnsw", "simd"] }
-ruvector-sona = { version = "0.1", features = ["serde-support"] }
 ```
----
-## Requirements
-| Component | Minimum |
-|-----------|---------|
-| RAM | 500 MB |
-| Storage | 400 MB |
-| Rust | 1.70+ |
-| Node | 18+ |
----
-## Architecture
-```
-Task --> RuvLTRA --> Agent Type --> Claude API
-         (free)      (100% acc)     (pay here)
-Query --> RuvLTRA --> Embedding --> HNSW --> Context
-          (free)      (free)       (free)    (free)
-```
-**Philosophy:** Simple, frequent decisions -> RuvLTRA (free, <10ms, 100% accurate). Complex reasoning -> Claude API (worth the cost).
----
-<details>
-<summary><b>Training Details</b></summary>
-### Training Data
-| Dataset | Count | Description |
-|---------|-------|-------------|
-| Base Triplets | 578 | Claude Code routing examples |
-| Claude Hard Negatives (Batch 1) | 100 | Opus 4.5 generated confusing pairs |
-| Claude Hard Negatives (Batch 2) | 400 | Additional confusing pairs |
-| Ecosystem Triplets | 1,467 | Full ecosystem coverage |
-| **Total v2.4** | **2,545** | Combined training set |
-### Training Procedure
-```
-Pipeline: Hard Negative Generation -> Contrastive Training -> GRPO Feedback -> GGUF Export
-1. Generate confusing agent pairs using Claude Opus 4.5
-2. Train with Triplet Loss + InfoNCE Loss
-3. Apply GRPO reward scaling from Claude judgments
-4. Export adapter weights for GGUF merging
-```
-### Hyperparameters
-| Parameter | Value |
-|-----------|-------|
-| Learning Rate | 2e-5 |
-| Batch Size | 32 |
-| Epochs | 30 |
-| Triplet Margin | 0.5 |
-| InfoNCE Temperature | 0.07 |
-| Weight Decay | 0.01 |
-| Optimizer | AdamW |
-### Training Infrastructure
-- **Hardware**: Apple Silicon (Metal GPU)
-- **Framework**: Candle (Rust ML)
-- **Training Time**: ~30 seconds for 30 epochs
-- **Final Loss**: 0.168
-</details>
-<details>
-<summary><b>Evaluation Results</b></summary>
-### Benchmark: Claude Flow Agent Routing (20 test cases)
-| Strategy | RuvLTRA | Qwen Base | Improvement |
-|----------|---------|-----------|-------------|
-| Embedding Only | 88.2% | 40.0% | **+48.2 pts** |
-| Keyword Only | 100.0% | 100.0% | same |
-| Hybrid 60/40 | 100.0% | 95.0% | +5.0 pts |
-| **Keyword-First** | **100.0%** | 95.0% | **+5.0 pts** |
-### Per-Agent Accuracy
-| Agent | Accuracy | Test Cases |
-|-------|----------|------------|
-| coder | 100% | 3 |
-| researcher | 100% | 2 |
-| reviewer | 100% | 2 |
-| tester | 100% | 2 |
-| architect | 100% | 2 |
-| security-architect | 100% | 2 |
-| debugger | 100% | 2 |
-| documenter | 100% | 1 |
-| refactorer | 100% | 1 |
-| optimizer | 100% | 1 |
-| devops | 100% | 1 |
-| api-docs | 100% | 1 |
-### Hard Negative Performance
-| Confusing Pair | Accuracy |
-|----------------|----------|
-| coder vs refactorer | 82% |
-| researcher vs architect | 79% |
-| reviewer vs tester | 84% |
-| debugger vs optimizer | 78% |
-| documenter vs api-docs | 85% |
-</details>
-<details>
-<summary><b>Limitations & Intended Use</b></summary>
-### Intended Use
-**Designed For:**
-- Task routing in Claude Code workflows
-- Agent classification (58+ types)
-- Semantic embedding for HNSW search
-- Local inference (<10ms latency)
-- Cost optimization (avoid API calls for routing)
-**NOT Designed For:**
-- General code generation
-- Multi-step reasoning
-- Chat/conversation
-- Languages other than English
-- Agent types beyond supported set
-### Known Limitations
-1. **Fixed Agent Types**: Routes to predefined agents
-2. **English Only**: Training data is English-only
-3. **Domain Specific**: Optimized for software development tasks
-4. **Embedding Fallback**: 88.2% accuracy when keywords don't match
-5. **Context Length**: Optimal for short task descriptions (<100 tokens)
-</details>
-<details>
-<summary><b>Version History</b></summary>
-| Version | Date | Changes |
-|---------|------|---------|
-| **v2.4** | 2025-01-21 | Ecosystem Edition: 2,545 triplets, 388 capabilities, 62 tests |
-| v2.3 | 2025-01-20 | 500+ hard negatives, 48% ratio, GRPO feedback |
-| v2.2 | 2025-01-15 | 100 hard negatives, 18% ratio |
-| v2.1 | 2025-01-10 | Contrastive learning, triplet loss |
-| v2.0 | 2025-01-05 | Hybrid routing strategy |
-| v1.0 | 2024-12-20 | Initial release |
-</details>
-<details>
-<summary><b>Citation</b></summary>
-### BibTeX
 ```bibtex
 @software{ruvltra2025,
-  title = {RuvLTRA: Local Task Routing for Claude Code Workflows},
-  author = {ruv},
   year = {2025},
-  url = {https://huggingface.co/ruv/ruvltra},
-  version = {2.4},
-  license = {Apache-2.0},
-  keywords = {agent-routing, embeddings, claude-code, contrastive-learning, ecosystem}
 }
 ```
-</details>
----
-## License
-Apache 2.0 - Free for commercial and personal use.
-## Links
-- [GitHub Repository](https://github.com/ruvnet/ruvector)
-- [Claude Flow](https://github.com/ruvnet/claude-flow)
-- [Documentation](https://github.com/ruvnet/ruvector/tree/main/docs)
-- [Training Code](https://github.com/ruvnet/ruvector/tree/main/crates/ruvllm/src/training)
-- [NPM Package](https://www.npmjs.com/package/@ruvector/ruvllm)
-## Keywords
-`agent-routing` `task-classification` `claude-code` `embeddings` `semantic-search` `gguf` `quantized` `edge-ai` `local-inference` `contrastive-learning` `triplet-loss` `infonce` `qwen` `llm` `mlops` `cost-optimization` `multi-agent` `swarm` `ruvector` `sona` `ecosystem`

 license: apache-2.0
 language:
 - en
+library_name: ruvllm
 tags:
+- agent-routing
 - claude-code
 - embeddings
+- gguf
+- rust
+- llm-inference
 datasets:
+- ruvnet/claude-flow-routing
+pipeline_tag: text-generation
 ---
+# RuvLTRA - Optimized Agent Routing Model
+## v2.5 - Performance Optimized Edition
+RuvLTRA is a purpose-built model family optimized for Claude Code agent routing, featuring HNSW-indexed pattern matching, zero-copy caching, and SIMD-accelerated inference.
+### What's New in v2.5
+| Optimization | Description | Improvement |
+|--------------|-------------|-------------|
+| **HNSW Index** | Hierarchical Navigable Small World graphs | 10x faster search at 10k entries |
+| **O(1) LRU Cache** | Using Rust `lru` crate | 23.5 ns cache lookups |
+| **Zero-Copy** | Arc<str> string interning | 100-1000x cache improvement |
+| **Batch SIMD** | AVX2/NEON vectorization | 4x throughput |
+| **Memory Pools** | Arena allocation | 50% fewer allocations |
+### Benchmarks
+| Operation | Performance |
+|-----------|-------------|
+| Query decomposition | 340 ns |
+| Cache lookup | 23.5 ns |
+| Memory search (10k entries) | ~0.4 ms |
+| Pattern retrieval | <25 us |
+| Routing accuracy (hybrid) | **100%** |
+| Routing accuracy (embedding-only) | 45% |
+### Models
+| File | Size | Purpose | Context |
+|------|------|---------|---------|
+| `ruvltra-claude-code-0.5b-q4_k_m.gguf` | 398 MB | Agent routing | 32K |
+| `ruvltra-small-0.5b-q4_k_m.gguf` | ~400 MB | General embeddings | 32K |
+| `ruvltra-medium-3b-q4_k_m.gguf` | ~2 GB | Full LLM inference | 256K |
+### Architecture
+| Model | Parameters | Hidden | Layers | GQA | Features |
+|-------|------------|--------|--------|-----|----------|
+| RuvLTRA-Small | 494M | 896 | 24 | 7:1 | SONA hooks, HNSW routing |
+| RuvLTRA-Medium | 3.0B | 2560 | 42 | 8:1 | Flash Attention 2, Speculative Decode |
+### Usage
+#### Python (HuggingFace Hub)
 ```python
+from huggingface_hub import hf_hub_download
+# Download the Claude Code routing model
+model_path = hf_hub_download(
+    repo_id="ruv/ruvltra",
+    filename="ruvltra-claude-code-0.5b-q4_k_m.gguf"
+)
+# Use with llama.cpp or other GGUF-compatible runtimes
 ```
+#### Rust (ruvllm crate)
+```rust
+use ruvllm::hub::{ModelDownloader, DownloadConfig};
+// Download from Hub
+let downloader = ModelDownloader::new(DownloadConfig::default());
+let model_path = downloader.download(
+    "ruv/ruvltra",
+    Some("./models"),
+)?;
+// Load and use
+use ruvllm::prelude::*;
+let mut backend = CandleBackend::with_device(DeviceType::Metal)?;
+backend.load_gguf(&model_path, ModelConfig::default())?;
 ```
+#### JavaScript/TypeScript (npm)
+```typescript
+import { RuvLLM } from '@ruvector/ruvllm';
+const llm = new RuvLLM({
+  model: 'ruv/ruvltra',
+  quantization: 'q4_k_m'
+});
+const result = await llm.route('implement authentication with JWT');
+console.log(result.recommendedAgent); // 'coder'
+console.log(result.confidence); // 0.95
 ```
+### Claude Code Integration
+RuvLTRA powers the intelligent 3-tier routing system in Claude Flow:
+| Tier | Handler | Latency | Use Cases |
+|------|---------|---------|-----------|
+| **1** | Agent Booster | <1ms | Simple transforms (var->const, add-types) |
+| **2** | Haiku | ~500ms | Simple tasks, bug fixes |
+| **3** | Sonnet/Opus | 2-5s | Architecture, security, complex reasoning |
+**Routing accuracy comparison:**
+| Strategy | RuvLTRA | Qwen Base |
+|----------|---------|-----------|
+| Embedding Only | 45% | 40% |
+| Keyword-First (Hybrid) | **100%** | 95% |
+### Training Data
+The Claude Code routing model was trained on:
+- 381 labeled examples covering 60+ agent types
+- 793 contrastive pairs for embedding fine-tuning
+- Synthetic data generated via claude-code-synth.js
+- LoRA fine-tuning on task-specific adapters
+### Performance Targets
+| Metric | Target | Status |
+|--------|--------|--------|
+| Flash Attention | 2.49x-7.47x speedup | Achieved |
+| HNSW Search | 150x-12,500x faster | Achieved |
+| Memory Reduction | 50-75% with quantization | Achieved |
+| MCP Response | <100ms | Achieved |
+| SONA Adaptation | <0.05ms | Achieved |
+### Links
+- **Crate**: [crates.io/crates/ruvllm](https://crates.io/crates/ruvllm)
+- **npm**: [npmjs.com/package/@ruvector/ruvllm](https://www.npmjs.com/package/@ruvector/ruvllm)
+- **Docs**: [docs.rs/ruvllm](https://docs.rs/ruvllm)
+- **GitHub**: [github.com/ruvnet/ruvector](https://github.com/ruvnet/ruvector)
+- **Claude Flow**: [github.com/ruvnet/claude-flow](https://github.com/ruvnet/claude-flow)
+### License
+Apache-2.0 / MIT dual license.
+### Citation
 ```bibtex
 @software{ruvltra2025,
+  author = {ruvnet},
+  title = {RuvLTRA: Optimized Agent Routing Model for Claude Code},
   year = {2025},
+  publisher = {HuggingFace},
+  url = {https://huggingface.co/ruv/ruvltra}
 }
 ```