Spaces:
Running
Running
| title: EnterpriseOps Arena | |
| emoji: π’ | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: docker | |
| pinned: false | |
| # EnterpriseOps Arena π’ | |
| > Multi-agent RL environment where IT, Manager, Finance and Oversight agents | |
| > collaborate to manage a simulated enterprise under partial observability, | |
| > schema drift, and SLA pressure. | |
| ## Quick Links | |
| - π **HuggingFace Space**: https://huggingface.co/spaces/Anurag137/enterprise-ops-arena | |
| - π **Colab Notebook**: https://github.com/anuragverma025/Meta-Hackathon/blob/main/enterprise_ops/train/colab_notebook.ipynb | |
| - π **Blog Post**: https://github.com/anuragverma025/Meta-Hackathon/blob/main/BLOG.md | |
| - π» **GitHub**: https://github.com/anuragverma025/Meta-Hackathon | |
| --- | |
| ## The Problem | |
| Enterprise AI agents fail because they work in silos. The IT agent | |
| resolving a critical server ticket does not know the Finance agent | |
| just blocked the budget it needs. The Manager does not know which | |
| tickets are about to breach SLA. No coordination = cascading failures. | |
| We built an RL environment that trains LLM agents to coordinate | |
| across departments β developing theory-of-mind reasoning through | |
| reinforcement learning. | |
| --- | |
| ## The Environment | |
| 4 specialized LLM agents operate inside a simulated enterprise: | |
| | Agent | Role | Sees | | |
| |---|---|---| | |
| | IT Agent | Resolves support tickets before SLA breach | Tickets + resource pool + inbox | | |
| | Manager Agent | Allocates shared resources, coordinates tasks | All dept summaries + project tasks | | |
| | Finance Agent | Approves budgets, blocks policy violations | Budget history + pending approvals | | |
| | Oversight Agent | Monitors all agents, catches hallucinations | ALL tool call logs (full visibility) | | |
| ### Key environment features | |
| - **Partial observability** β each agent sees only its department | |
| - **5 mock enterprise APIs** β get_tickets, resolve_ticket, allocate_resource, approve_budget, get_project_status | |
| - **Schema drift** β API fields mutate every 20 steps, forcing real adaptation | |
| - **8 scenarios** β difficulty 1 to 8, from simple IT tasks to full enterprise chaos | |
| - **Message bus** β agents coordinate by sending structured messages | |
| - **Anti-reward-hacking** β timeout, loop detection, state locks, oversight monitoring | |
| --- | |
| ## Reward Design | |
| 4 independent reward functions (composable, hard to game): | |
| | Function | Signal | | |
| |---|---| | |
| | task_completion | +10 per resolved ticket/task, verified by state diff | | |
| | sla_adherence | +7.5 before deadline, -5 on breach | | |
| | coordination_bonus | +6 when message leads to correct action next step | | |
| | hallucination_penalty | -8 for calling non-existent API fields | | |
| The Oversight Agent earns +15 for catching hallucinations, +8 for | |
| policy breaches, +5 for stale schema usage. | |
| --- | |
| ## Training Results | |
| Real training run β 200 steps on T4 GPU β 32 minutes | |
|  | |
|  | |
| ### Key findings | |
| - **GRPO reward**: -1.0 β +1.5 (crossed zero β model is learning) | |
| - **Curriculum**: Advanced automatically from scenario_01 β scenario_03 | |
| - **Train loss**: -0.023 | |
| - **Model**: Qwen2.5-3B-Instruct, 4-bit quantized via Unsloth | |
| - **Method**: GRPO via HuggingFace TRL | |
| ### What the curves show | |
| - Episode score dropped at step 110 when curriculum advanced to harder scenario β agents were challenged | |
| - Score recovered by step 200 β agents adapted | |
| - Curriculum difficulty staircase shows automatic advancement β no human intervention | |
| - GRPO reward crossed from negative to positive β proof of learning | |
| --- | |
| ## How to Run | |
| ### Run the environment locally | |
| ```bash | |
| git clone https://huggingface.co/spaces/Anurag137/enterprise-ops-arena | |
| cd enterprise-ops-arena | |
| pip install -r requirements.txt | |
| uvicorn app:app --host 0.0.0.0 --port 7860 | |
| ``` | |
| ### Test the API | |
| ```bash | |
| # Health check | |
| curl https://anurag137-enterprise-ops-arena.hf.space/health | |
| # View all endpoints | |
| open https://anurag137-enterprise-ops-arena.hf.space/docs | |
| ``` | |
| ### Run training | |
| ```bash | |
| git clone https://github.com/anuragverma025/Meta-Hackathon | |
| cd Meta-Hackathon/enterprise_ops | |
| pip install -e . | |
| python -m enterprise_ops.train.main --scenario scenario_01 --steps 200 | |
| ``` | |
| --- | |
| ## Tech Stack | |
| | Component | Technology | | |
| |---|---| | |
| | Environment | OpenEnv + FastAPI + SQLite | | |
| | Schemas | Pydantic v2 | | |
| | Training | HuggingFace TRL + GRPO | | |
| | Model | Qwen2.5-3B-Instruct | | |
| | Efficiency | Unsloth 4-bit quantization | | |
| | Deployment | HuggingFace Spaces + Docker | | |
| | UI | Gradio mounted on FastAPI | | |
| --- | |
| ## Themes Covered | |
| - **Theme 1** β Multi-Agent Interactions | |
| - **Theme 3.1** β World Modeling: Professional Tasks | |
| ### Bonus prizes targeted | |
| - Fleet AI β Scalable Oversight (OversightAgent) | |
| - Halluminate β Multi-Actor Environments | |
| - Scale AI β Sales/PM/IT enterprise workflows | |
| - Scaler AI Labs β Multi-app enterprise RL | |
| - Patronus AI β Schema drift + dynamic contracts | |
| --- | |
| ## Project Structure | |