ruv commited on
Commit
c316a38
·
verified ·
1 Parent(s): be2c118

docs: Update model card for v2.5 - Performance Optimized Edition

Browse files
Files changed (1) hide show
  1. README.md +111 -431
README.md CHANGED
@@ -2,485 +2,165 @@
2
  license: apache-2.0
3
  language:
4
  - en
 
5
  tags:
6
- - llm
7
- - code-generation
8
  - claude-code
9
- - sona
10
- - swarm
11
- - multi-agent
12
- - gguf
13
- - quantized
14
- - edge-ai
15
- - self-learning
16
- - ruvector
17
  - embeddings
18
- - routing
19
- - cost-optimization
20
- - contrastive-learning
21
- - triplet-loss
22
- - infonce
23
- - agent-routing
24
- - sota
25
- - task-routing
26
- - semantic-search
27
- - ecosystem
28
- library_name: ruvllm
29
- pipeline_tag: text-classification
30
- base_model: Qwen/Qwen2.5-0.5B-Instruct
31
  datasets:
32
- - custom
33
- model-index:
34
- - name: RuvLTRA Claude Code 0.5B
35
- results:
36
- - task:
37
- type: text-classification
38
- name: Agent Routing
39
- dataset:
40
- type: custom
41
- name: Claude Flow Routing Triplets
42
- metrics:
43
- - type: accuracy
44
- value: 0.882
45
- name: Embedding-Only Accuracy
46
- - type: accuracy
47
- value: 1.0
48
- name: Hybrid Routing Accuracy
49
- - type: accuracy
50
- value: 0.812
51
- name: Hard Negative Accuracy
52
- widget:
53
- - text: "Route: Implement authentication\nAgent:"
54
- example_title: Code Task
55
- - text: "Route: Review the pull request\nAgent:"
56
- example_title: Review Task
57
- - text: "Route: Fix the null pointer bug\nAgent:"
58
- example_title: Debug Task
59
- - text: "Route: Design database schema\nAgent:"
60
- example_title: Architecture Task
61
- ---
62
-
63
- # RuvLTRA v2.4 - Ecosystem Edition
64
-
65
- <p align="center">
66
- <img src="https://img.shields.io/badge/Hybrid_Routing-100%25-brightgreen" alt="Hybrid Accuracy">
67
- <img src="https://img.shields.io/badge/Embedding-88.2%25-green" alt="Embedding Accuracy">
68
- <img src="https://img.shields.io/badge/GGUF-Q4__K__M-blue" alt="GGUF">
69
- <img src="https://img.shields.io/badge/Latency-<10ms-orange" alt="Latency">
70
- <img src="https://img.shields.io/badge/Capabilities-388-cyan" alt="Capabilities">
71
- <img src="https://img.shields.io/badge/License-Apache%202.0-green" alt="License">
72
- <img src="https://img.shields.io/badge/Version-v2.4-purple" alt="Version">
73
- </p>
74
-
75
- **RuvLTRA** is a collection of optimized models designed for **local routing, embeddings, and task classification** in Claude Code workflows - achieving **100% routing accuracy** with hybrid strategy.
76
-
77
- ## What's New in v2.4 (Ecosystem Edition)
78
-
79
- - **2,545 training triplets** (1,078 SOTA + 1,467 ecosystem-specific)
80
- - **Full ecosystem coverage**: claude-flow, agentic-flow, ruvector
81
- - **388 total capabilities** across all tools
82
- - **62 validation tests** with 100% accuracy
83
- - **30-epoch SOTA training** with 88.2% embedding accuracy
84
-
85
- ## Key Philosophy
86
-
87
- > **Benchmark Note:** HumanEval/MBPP don't apply here. RuvLTRA isn't designed to compete with Claude for code generation from scratch.
88
-
89
- ### Use Case Comparison
90
-
91
- | Task | RuvLTRA | Claude API |
92
- |------|---------|------------|
93
- | Route task to correct agent | Local, fast, **100% accuracy** | Overkill |
94
- | Generate embeddings for HNSW | Purpose-built | No embedding API |
95
- | Quick classification/routing | <10ms local | ~500ms+ API |
96
- | Memory retrieval scoring | Integrated | Not designed for |
97
- | Complex code generation | Use Claude | Optimal |
98
- | Multi-step reasoning | Use Claude | Optimal |
99
-
100
- ---
101
-
102
- ## SOTA: 100% Routing Accuracy
103
-
104
- Using **hybrid keyword+embedding strategy** plus **contrastive fine-tuning**, RuvLTRA achieves:
105
-
106
- ### SOTA Benchmark Results
107
-
108
- | Metric | Before | After | Method |
109
- |--------|--------|-------|--------|
110
- | **Hybrid Routing** | 95% | **100%** | Keyword-First + Embedding Fallback |
111
- | **Embedding-Only** | 45% | **88.2%** | Contrastive Learning (Triplet + InfoNCE) |
112
- | **Hard Negatives** | N/A | **81.2%** | Claude Opus 4.5 Generated Pairs |
113
-
114
- ### Strategy Comparison (20 test cases)
115
-
116
- | Strategy | RuvLTRA | Qwen Base | Improvement |
117
- |----------|---------|-----------|-------------|
118
- | Embedding Only | 88.2% | 40.0% | +48.2 pts |
119
- | **Keyword-First Hybrid** | **100.0%** | 95.0% | +5 pts |
120
-
121
- ### v2.4 Training Enhancements
122
-
123
- | Feature | v2.3 | v2.4 |
124
- |---------|------|------|
125
- | Training Triplets | 1,078 | **2,545** |
126
- | Ecosystem Coverage | Claude Flow only | **Full ecosystem** |
127
- | Total Capabilities | 179 | **388** |
128
- | Validation Tests | 20 | **62** |
129
- | Hard Negative Ratio | 18% | **18%** |
130
- | Training Epochs | 20 | **30** |
131
-
132
- ### Ecosystem Coverage (v2.4)
133
-
134
- | Tool | CLI Commands | Agents | Special Features |
135
- |------|--------------|--------|------------------|
136
- | **claude-flow** | 26 (179 subcommands) | 58 types | 27 hooks, 12 workers, 29 skills |
137
- | **agentic-flow** | 17 commands | 33 types | 32 MCP tools, 9 RL algorithms |
138
- | **ruvector** | 6 CLI, 22 Rust crates | 12 NPM | 6 attention, 4 graph algorithms |
139
-
140
  ---
141
 
142
- ## Cost Savings
143
 
144
- | Operation | Claude API | RuvLTRA Local | Savings |
145
- |-----------|------------|---------------|---------|
146
- | Task routing | $0.003 / call | $0 | **100%** |
147
- | Embedding generation | $0.0001 / call | $0 | **100%** |
148
- | Latency | ~500ms | <10ms | **50x faster** |
149
 
150
- **Monthly example:** ~$250/month savings (50K routing calls + 100K embeddings)
151
 
152
- ---
153
 
154
- ## Available Models
 
 
 
 
 
 
155
 
156
- | Model | Size | RAM | Latency |
157
- |-------|------|-----|---------|
158
- | `ruvltra-claude-code-0.5b-q4_k_m.gguf` | 398 MB | ~500 MB | <10ms |
159
- | `ruvltra-small-0.5b-q4_k_m.gguf` | 398 MB | ~500 MB | <10ms |
160
- | `ruvltra-medium-1.1b-q4_k_m.gguf` | 800 MB | ~1 GB | <20ms |
161
 
162
- ---
 
 
 
 
 
 
 
163
 
164
- ## Quick Start
165
 
166
- ### Installation
167
- ```bash
168
- npm install @ruvector/ruvllm
169
- # or
170
- npx ruvector install
171
- ```
172
 
173
- ### Automatic Model Download
174
- ```javascript
175
- const { SemanticRouter } = require('@ruvector/ruvllm');
176
 
177
- // Automatically downloads from HuggingFace if not cached
178
- const router = new SemanticRouter({
179
- model: 'ruvltra-claude-code-0.5b', // Auto-downloads
180
- strategy: 'keyword-first'
181
- });
182
 
183
- const result = await router.route('Implement authentication system');
184
- // { agent: 'coder', confidence: 0.92 }
185
- ```
186
 
187
- ### Manual Download
188
- ```bash
189
- wget https://huggingface.co/ruv/ruvltra/resolve/main/ruvltra-claude-code-0.5b-q4_k_m.gguf
190
- ```
191
 
192
- ### Python Example
193
  ```python
194
- from llama_cpp import Llama
195
 
196
- router = Llama(model_path="ruvltra-claude-code-0.5b-q4_k_m.gguf", n_ctx=512)
197
- result = router("Route: Add validation\nAgent:", max_tokens=8)
198
- print(result['choices'][0]['text']) # -> "coder"
199
- ```
 
200
 
201
- ### Rust Example
202
- ```rust
203
- use ruvllm::backends::{create_backend, GenerateParams};
204
-
205
- let mut llm = create_backend();
206
- llm.load_model("ruvltra-claude-code-0.5b-q4_k_m.gguf", Default::default())?;
207
-
208
- let agent = llm.generate("Route: fix bug\nAgent:", GenerateParams::default().with_max_tokens(8))?;
209
  ```
210
 
211
- ---
212
-
213
- ## Hybrid Routing Algorithm
214
 
215
- The model achieves 100% accuracy using a two-stage routing strategy:
216
-
217
- ```
218
- 1. KEYWORD MATCHING (Primary)
219
- - Check task for trigger keywords
220
- - Priority ordering resolves conflicts
221
- - "investigate" -> researcher (priority)
222
- - "optimize queries" -> optimizer
223
-
224
- 2. EMBEDDING FALLBACK (Secondary)
225
- - If no keywords match, use embeddings
226
- - Compare task embedding vs agent descriptions
227
- - Cosine similarity for ranking
 
228
  ```
229
 
230
- ---
231
-
232
- ## Supported Agent Types (58+)
233
-
234
- | Agent | Keywords | Use Cases |
235
- |-------|----------|-----------|
236
- | `coder` | implement, build, create | Code implementation |
237
- | `researcher` | research, investigate, explore | Information gathering |
238
- | `reviewer` | review, pull request, quality | Code review |
239
- | `tester` | test, unit, integration | Testing |
240
- | `architect` | design, architecture, schema | System design |
241
- | `security-architect` | security, vulnerability, xss | Security analysis |
242
- | `debugger` | debug, fix, bug, error | Bug fixing |
243
- | `documenter` | jsdoc, comment, readme | Documentation |
244
- | `refactorer` | refactor, async/await | Code refactoring |
245
- | `optimizer` | optimize, cache, performance | Performance |
246
- | `devops` | deploy, ci/cd, kubernetes | DevOps |
247
- | `api-docs` | openapi, swagger, api spec | API documentation |
248
- | `planner` | sprint, plan, roadmap | Project planning |
249
-
250
- ### Extended Capabilities (v2.4)
251
-
252
- | Category | Examples |
253
- |----------|----------|
254
- | **MCP Tools** | memory_store, agent_spawn, swarm_init, hooks_pre-task |
255
- | **Swarm Topologies** | hierarchical, mesh, ring, star, adaptive |
256
- | **Consensus** | byzantine, raft, gossip, crdt, quorum |
257
- | **Learning** | SONA train, LoRA finetune, EWC++ consolidate, GRPO optimize |
258
- | **Attention** | flash, multi-head, linear, hyperbolic, MoE |
259
- | **Graph** | mincut, GNN embed, spectral, pagerank |
260
- | **Hardware** | Metal GPU, NEON SIMD, ANE neural engine |
261
-
262
- ---
263
 
264
- ## Technical Specifications
 
265
 
266
- | Specification | Value |
267
- |--------------|-------|
268
- | Base Model | Qwen2.5-0.5B-Instruct |
269
- | Parameters | 494M |
270
- | Embedding Dimensions | 896 |
271
- | Quantization | Q4_K_M |
272
- | File Size | 398 MB |
273
- | Context Length | 32768 tokens |
274
-
275
- ---
276
 
277
- ## Rust Crates
278
-
279
- | Crate | Description |
280
- |-------|-------------|
281
- | **ruvllm** | LLM runtime with SONA learning |
282
- | **ruvector-core** | HNSW vector database |
283
- | **ruvector-sona** | Self-optimizing neural architecture |
284
- | **ruvector-attention** | Attention mechanisms |
285
- | **ruvector-gnn** | Graph neural network on HNSW |
286
- | **ruvector-graph** | Distributed hypergraph database |
287
-
288
- ```toml
289
- [dependencies]
290
- ruvllm = "0.1"
291
- ruvector-core = { version = "0.1", features = ["hnsw", "simd"] }
292
- ruvector-sona = { version = "0.1", features = ["serde-support"] }
293
  ```
294
 
295
- ---
296
 
297
- ## Requirements
298
 
299
- | Component | Minimum |
300
- |-----------|---------|
301
- | RAM | 500 MB |
302
- | Storage | 400 MB |
303
- | Rust | 1.70+ |
304
- | Node | 18+ |
305
 
306
- ---
307
 
308
- ## Architecture
 
 
 
309
 
310
- ```
311
- Task --> RuvLTRA --> Agent Type --> Claude API
312
- (free) (100% acc) (pay here)
313
 
314
- Query --> RuvLTRA --> Embedding --> HNSW --> Context
315
- (free) (free) (free) (free)
316
- ```
 
 
317
 
318
- **Philosophy:** Simple, frequent decisions -> RuvLTRA (free, <10ms, 100% accurate). Complex reasoning -> Claude API (worth the cost).
319
 
320
- ---
 
 
 
 
 
 
321
 
322
- <details>
323
- <summary><b>Training Details</b></summary>
324
 
325
- ### Training Data
 
 
 
 
326
 
327
- | Dataset | Count | Description |
328
- |---------|-------|-------------|
329
- | Base Triplets | 578 | Claude Code routing examples |
330
- | Claude Hard Negatives (Batch 1) | 100 | Opus 4.5 generated confusing pairs |
331
- | Claude Hard Negatives (Batch 2) | 400 | Additional confusing pairs |
332
- | Ecosystem Triplets | 1,467 | Full ecosystem coverage |
333
- | **Total v2.4** | **2,545** | Combined training set |
334
 
335
- ### Training Procedure
336
 
337
- ```
338
- Pipeline: Hard Negative Generation -> Contrastive Training -> GRPO Feedback -> GGUF Export
339
-
340
- 1. Generate confusing agent pairs using Claude Opus 4.5
341
- 2. Train with Triplet Loss + InfoNCE Loss
342
- 3. Apply GRPO reward scaling from Claude judgments
343
- 4. Export adapter weights for GGUF merging
344
- ```
345
-
346
- ### Hyperparameters
347
-
348
- | Parameter | Value |
349
- |-----------|-------|
350
- | Learning Rate | 2e-5 |
351
- | Batch Size | 32 |
352
- | Epochs | 30 |
353
- | Triplet Margin | 0.5 |
354
- | InfoNCE Temperature | 0.07 |
355
- | Weight Decay | 0.01 |
356
- | Optimizer | AdamW |
357
-
358
- ### Training Infrastructure
359
-
360
- - **Hardware**: Apple Silicon (Metal GPU)
361
- - **Framework**: Candle (Rust ML)
362
- - **Training Time**: ~30 seconds for 30 epochs
363
- - **Final Loss**: 0.168
364
-
365
- </details>
366
-
367
- <details>
368
- <summary><b>Evaluation Results</b></summary>
369
-
370
- ### Benchmark: Claude Flow Agent Routing (20 test cases)
371
-
372
- | Strategy | RuvLTRA | Qwen Base | Improvement |
373
- |----------|---------|-----------|-------------|
374
- | Embedding Only | 88.2% | 40.0% | **+48.2 pts** |
375
- | Keyword Only | 100.0% | 100.0% | same |
376
- | Hybrid 60/40 | 100.0% | 95.0% | +5.0 pts |
377
- | **Keyword-First** | **100.0%** | 95.0% | **+5.0 pts** |
378
-
379
- ### Per-Agent Accuracy
380
-
381
- | Agent | Accuracy | Test Cases |
382
- |-------|----------|------------|
383
- | coder | 100% | 3 |
384
- | researcher | 100% | 2 |
385
- | reviewer | 100% | 2 |
386
- | tester | 100% | 2 |
387
- | architect | 100% | 2 |
388
- | security-architect | 100% | 2 |
389
- | debugger | 100% | 2 |
390
- | documenter | 100% | 1 |
391
- | refactorer | 100% | 1 |
392
- | optimizer | 100% | 1 |
393
- | devops | 100% | 1 |
394
- | api-docs | 100% | 1 |
395
-
396
- ### Hard Negative Performance
397
-
398
- | Confusing Pair | Accuracy |
399
- |----------------|----------|
400
- | coder vs refactorer | 82% |
401
- | researcher vs architect | 79% |
402
- | reviewer vs tester | 84% |
403
- | debugger vs optimizer | 78% |
404
- | documenter vs api-docs | 85% |
405
-
406
- </details>
407
-
408
- <details>
409
- <summary><b>Limitations & Intended Use</b></summary>
410
-
411
- ### Intended Use
412
-
413
- **Designed For:**
414
- - Task routing in Claude Code workflows
415
- - Agent classification (58+ types)
416
- - Semantic embedding for HNSW search
417
- - Local inference (<10ms latency)
418
- - Cost optimization (avoid API calls for routing)
419
-
420
- **NOT Designed For:**
421
- - General code generation
422
- - Multi-step reasoning
423
- - Chat/conversation
424
- - Languages other than English
425
- - Agent types beyond supported set
426
-
427
- ### Known Limitations
428
-
429
- 1. **Fixed Agent Types**: Routes to predefined agents
430
- 2. **English Only**: Training data is English-only
431
- 3. **Domain Specific**: Optimized for software development tasks
432
- 4. **Embedding Fallback**: 88.2% accuracy when keywords don't match
433
- 5. **Context Length**: Optimal for short task descriptions (<100 tokens)
434
-
435
- </details>
436
-
437
- <details>
438
- <summary><b>Version History</b></summary>
439
-
440
- | Version | Date | Changes |
441
- |---------|------|---------|
442
- | **v2.4** | 2025-01-21 | Ecosystem Edition: 2,545 triplets, 388 capabilities, 62 tests |
443
- | v2.3 | 2025-01-20 | 500+ hard negatives, 48% ratio, GRPO feedback |
444
- | v2.2 | 2025-01-15 | 100 hard negatives, 18% ratio |
445
- | v2.1 | 2025-01-10 | Contrastive learning, triplet loss |
446
- | v2.0 | 2025-01-05 | Hybrid routing strategy |
447
- | v1.0 | 2024-12-20 | Initial release |
448
-
449
- </details>
450
-
451
- <details>
452
- <summary><b>Citation</b></summary>
453
-
454
- ### BibTeX
455
 
456
  ```bibtex
457
  @software{ruvltra2025,
458
- title = {RuvLTRA: Local Task Routing for Claude Code Workflows},
459
- author = {ruv},
460
  year = {2025},
461
- url = {https://huggingface.co/ruv/ruvltra},
462
- version = {2.4},
463
- license = {Apache-2.0},
464
- keywords = {agent-routing, embeddings, claude-code, contrastive-learning, ecosystem}
465
  }
466
  ```
467
-
468
- </details>
469
-
470
- ---
471
-
472
- ## License
473
-
474
- Apache 2.0 - Free for commercial and personal use.
475
-
476
- ## Links
477
-
478
- - [GitHub Repository](https://github.com/ruvnet/ruvector)
479
- - [Claude Flow](https://github.com/ruvnet/claude-flow)
480
- - [Documentation](https://github.com/ruvnet/ruvector/tree/main/docs)
481
- - [Training Code](https://github.com/ruvnet/ruvector/tree/main/crates/ruvllm/src/training)
482
- - [NPM Package](https://www.npmjs.com/package/@ruvector/ruvllm)
483
-
484
- ## Keywords
485
-
486
- `agent-routing` `task-classification` `claude-code` `embeddings` `semantic-search` `gguf` `quantized` `edge-ai` `local-inference` `contrastive-learning` `triplet-loss` `infonce` `qwen` `llm` `mlops` `cost-optimization` `multi-agent` `swarm` `ruvector` `sona` `ecosystem`
 
2
  license: apache-2.0
3
  language:
4
  - en
5
+ library_name: ruvllm
6
  tags:
7
+ - agent-routing
 
8
  - claude-code
 
 
 
 
 
 
 
 
9
  - embeddings
10
+ - gguf
11
+ - rust
12
+ - llm-inference
 
 
 
 
 
 
 
 
 
 
13
  datasets:
14
+ - ruvnet/claude-flow-routing
15
+ pipeline_tag: text-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ---
17
 
18
+ # RuvLTRA - Optimized Agent Routing Model
19
 
20
+ ## v2.5 - Performance Optimized Edition
 
 
 
 
21
 
22
+ RuvLTRA is a purpose-built model family optimized for Claude Code agent routing, featuring HNSW-indexed pattern matching, zero-copy caching, and SIMD-accelerated inference.
23
 
24
+ ### What's New in v2.5
25
 
26
+ | Optimization | Description | Improvement |
27
+ |--------------|-------------|-------------|
28
+ | **HNSW Index** | Hierarchical Navigable Small World graphs | 10x faster search at 10k entries |
29
+ | **O(1) LRU Cache** | Using Rust `lru` crate | 23.5 ns cache lookups |
30
+ | **Zero-Copy** | Arc<str> string interning | 100-1000x cache improvement |
31
+ | **Batch SIMD** | AVX2/NEON vectorization | 4x throughput |
32
+ | **Memory Pools** | Arena allocation | 50% fewer allocations |
33
 
34
+ ### Benchmarks
 
 
 
 
35
 
36
+ | Operation | Performance |
37
+ |-----------|-------------|
38
+ | Query decomposition | 340 ns |
39
+ | Cache lookup | 23.5 ns |
40
+ | Memory search (10k entries) | ~0.4 ms |
41
+ | Pattern retrieval | <25 us |
42
+ | Routing accuracy (hybrid) | **100%** |
43
+ | Routing accuracy (embedding-only) | 45% |
44
 
45
+ ### Models
46
 
47
+ | File | Size | Purpose | Context |
48
+ |------|------|---------|---------|
49
+ | `ruvltra-claude-code-0.5b-q4_k_m.gguf` | 398 MB | Agent routing | 32K |
50
+ | `ruvltra-small-0.5b-q4_k_m.gguf` | ~400 MB | General embeddings | 32K |
51
+ | `ruvltra-medium-3b-q4_k_m.gguf` | ~2 GB | Full LLM inference | 256K |
 
52
 
53
+ ### Architecture
 
 
54
 
55
+ | Model | Parameters | Hidden | Layers | GQA | Features |
56
+ |-------|------------|--------|--------|-----|----------|
57
+ | RuvLTRA-Small | 494M | 896 | 24 | 7:1 | SONA hooks, HNSW routing |
58
+ | RuvLTRA-Medium | 3.0B | 2560 | 42 | 8:1 | Flash Attention 2, Speculative Decode |
 
59
 
60
+ ### Usage
 
 
61
 
62
+ #### Python (HuggingFace Hub)
 
 
 
63
 
 
64
  ```python
65
+ from huggingface_hub import hf_hub_download
66
 
67
+ # Download the Claude Code routing model
68
+ model_path = hf_hub_download(
69
+ repo_id="ruv/ruvltra",
70
+ filename="ruvltra-claude-code-0.5b-q4_k_m.gguf"
71
+ )
72
 
73
+ # Use with llama.cpp or other GGUF-compatible runtimes
 
 
 
 
 
 
 
74
  ```
75
 
76
+ #### Rust (ruvllm crate)
 
 
77
 
78
+ ```rust
79
+ use ruvllm::hub::{ModelDownloader, DownloadConfig};
80
+
81
+ // Download from Hub
82
+ let downloader = ModelDownloader::new(DownloadConfig::default());
83
+ let model_path = downloader.download(
84
+ "ruv/ruvltra",
85
+ Some("./models"),
86
+ )?;
87
+
88
+ // Load and use
89
+ use ruvllm::prelude::*;
90
+ let mut backend = CandleBackend::with_device(DeviceType::Metal)?;
91
+ backend.load_gguf(&model_path, ModelConfig::default())?;
92
  ```
93
 
94
+ #### JavaScript/TypeScript (npm)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95
 
96
+ ```typescript
97
+ import { RuvLLM } from '@ruvector/ruvllm';
98
 
99
+ const llm = new RuvLLM({
100
+ model: 'ruv/ruvltra',
101
+ quantization: 'q4_k_m'
102
+ });
 
 
 
 
 
 
103
 
104
+ const result = await llm.route('implement authentication with JWT');
105
+ console.log(result.recommendedAgent); // 'coder'
106
+ console.log(result.confidence); // 0.95
 
 
 
 
 
 
 
 
 
 
 
 
 
107
  ```
108
 
109
+ ### Claude Code Integration
110
 
111
+ RuvLTRA powers the intelligent 3-tier routing system in Claude Flow:
112
 
113
+ | Tier | Handler | Latency | Use Cases |
114
+ |------|---------|---------|-----------|
115
+ | **1** | Agent Booster | <1ms | Simple transforms (var->const, add-types) |
116
+ | **2** | Haiku | ~500ms | Simple tasks, bug fixes |
117
+ | **3** | Sonnet/Opus | 2-5s | Architecture, security, complex reasoning |
 
118
 
119
+ **Routing accuracy comparison:**
120
 
121
+ | Strategy | RuvLTRA | Qwen Base |
122
+ |----------|---------|-----------|
123
+ | Embedding Only | 45% | 40% |
124
+ | Keyword-First (Hybrid) | **100%** | 95% |
125
 
126
+ ### Training Data
 
 
127
 
128
+ The Claude Code routing model was trained on:
129
+ - 381 labeled examples covering 60+ agent types
130
+ - 793 contrastive pairs for embedding fine-tuning
131
+ - Synthetic data generated via claude-code-synth.js
132
+ - LoRA fine-tuning on task-specific adapters
133
 
134
+ ### Performance Targets
135
 
136
+ | Metric | Target | Status |
137
+ |--------|--------|--------|
138
+ | Flash Attention | 2.49x-7.47x speedup | Achieved |
139
+ | HNSW Search | 150x-12,500x faster | Achieved |
140
+ | Memory Reduction | 50-75% with quantization | Achieved |
141
+ | MCP Response | <100ms | Achieved |
142
+ | SONA Adaptation | <0.05ms | Achieved |
143
 
144
+ ### Links
 
145
 
146
+ - **Crate**: [crates.io/crates/ruvllm](https://crates.io/crates/ruvllm)
147
+ - **npm**: [npmjs.com/package/@ruvector/ruvllm](https://www.npmjs.com/package/@ruvector/ruvllm)
148
+ - **Docs**: [docs.rs/ruvllm](https://docs.rs/ruvllm)
149
+ - **GitHub**: [github.com/ruvnet/ruvector](https://github.com/ruvnet/ruvector)
150
+ - **Claude Flow**: [github.com/ruvnet/claude-flow](https://github.com/ruvnet/claude-flow)
151
 
152
+ ### License
 
 
 
 
 
 
153
 
154
+ Apache-2.0 / MIT dual license.
155
 
156
+ ### Citation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
157
 
158
  ```bibtex
159
  @software{ruvltra2025,
160
+ author = {ruvnet},
161
+ title = {RuvLTRA: Optimized Agent Routing Model for Claude Code},
162
  year = {2025},
163
+ publisher = {HuggingFace},
164
+ url = {https://huggingface.co/ruv/ruvltra}
 
 
165
  }
166
  ```