Spaces:
Running
Running
File size: 4,888 Bytes
47bf3ae cbd6773 47bf3ae 4a7bb0f d6bc459 4a7bb0f d6bc459 4a7bb0f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 | ---
title: EnterpriseOps Arena
emoji: π’
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
---
# EnterpriseOps Arena π’
> Multi-agent RL environment where IT, Manager, Finance and Oversight agents
> collaborate to manage a simulated enterprise under partial observability,
> schema drift, and SLA pressure.
## Quick Links
- π **HuggingFace Space**: https://huggingface.co/spaces/Anurag137/enterprise-ops-arena
- π **Colab Notebook**: https://github.com/anuragverma025/Meta-Hackathon/blob/main/enterprise_ops/train/colab_notebook.ipynb
- π **Blog Post**: https://github.com/anuragverma025/Meta-Hackathon/blob/main/BLOG.md
- π» **GitHub**: https://github.com/anuragverma025/Meta-Hackathon
---
## The Problem
Enterprise AI agents fail because they work in silos. The IT agent
resolving a critical server ticket does not know the Finance agent
just blocked the budget it needs. The Manager does not know which
tickets are about to breach SLA. No coordination = cascading failures.
We built an RL environment that trains LLM agents to coordinate
across departments β developing theory-of-mind reasoning through
reinforcement learning.
---
## The Environment
4 specialized LLM agents operate inside a simulated enterprise:
| Agent | Role | Sees |
|---|---|---|
| IT Agent | Resolves support tickets before SLA breach | Tickets + resource pool + inbox |
| Manager Agent | Allocates shared resources, coordinates tasks | All dept summaries + project tasks |
| Finance Agent | Approves budgets, blocks policy violations | Budget history + pending approvals |
| Oversight Agent | Monitors all agents, catches hallucinations | ALL tool call logs (full visibility) |
### Key environment features
- **Partial observability** β each agent sees only its department
- **5 mock enterprise APIs** β get_tickets, resolve_ticket, allocate_resource, approve_budget, get_project_status
- **Schema drift** β API fields mutate every 20 steps, forcing real adaptation
- **8 scenarios** β difficulty 1 to 8, from simple IT tasks to full enterprise chaos
- **Message bus** β agents coordinate by sending structured messages
- **Anti-reward-hacking** β timeout, loop detection, state locks, oversight monitoring
---
## Reward Design
4 independent reward functions (composable, hard to game):
| Function | Signal |
|---|---|
| task_completion | +10 per resolved ticket/task, verified by state diff |
| sla_adherence | +7.5 before deadline, -5 on breach |
| coordination_bonus | +6 when message leads to correct action next step |
| hallucination_penalty | -8 for calling non-existent API fields |
The Oversight Agent earns +15 for catching hallucinations, +8 for
policy breaches, +5 for stale schema usage.
---
## Training Results
Real training run β 200 steps on T4 GPU β 32 minutes


### Key findings
- **GRPO reward**: -1.0 β +1.5 (crossed zero β model is learning)
- **Curriculum**: Advanced automatically from scenario_01 β scenario_03
- **Train loss**: -0.023
- **Model**: Qwen2.5-3B-Instruct, 4-bit quantized via Unsloth
- **Method**: GRPO via HuggingFace TRL
### What the curves show
- Episode score dropped at step 110 when curriculum advanced to harder scenario β agents were challenged
- Score recovered by step 200 β agents adapted
- Curriculum difficulty staircase shows automatic advancement β no human intervention
- GRPO reward crossed from negative to positive β proof of learning
---
## How to Run
### Run the environment locally
```bash
git clone https://huggingface.co/spaces/Anurag137/enterprise-ops-arena
cd enterprise-ops-arena
pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 7860
```
### Test the API
```bash
# Health check
curl https://anurag137-enterprise-ops-arena.hf.space/health
# View all endpoints
open https://anurag137-enterprise-ops-arena.hf.space/docs
```
### Run training
```bash
git clone https://github.com/anuragverma025/Meta-Hackathon
cd Meta-Hackathon/enterprise_ops
pip install -e .
python -m enterprise_ops.train.main --scenario scenario_01 --steps 200
```
---
## Tech Stack
| Component | Technology |
|---|---|
| Environment | OpenEnv + FastAPI + SQLite |
| Schemas | Pydantic v2 |
| Training | HuggingFace TRL + GRPO |
| Model | Qwen2.5-3B-Instruct |
| Efficiency | Unsloth 4-bit quantization |
| Deployment | HuggingFace Spaces + Docker |
| UI | Gradio mounted on FastAPI |
---
## Themes Covered
- **Theme 1** β Multi-Agent Interactions
- **Theme 3.1** β World Modeling: Professional Tasks
### Bonus prizes targeted
- Fleet AI β Scalable Oversight (OversightAgent)
- Halluminate β Multi-Actor Environments
- Scale AI β Sales/PM/IT enterprise workflows
- Scaler AI Labs β Multi-app enterprise RL
- Patronus AI β Schema drift + dynamic contracts
---
## Project Structure
|