Spaces:
Running
title: EnterpriseOps Arena
emoji: π’
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
EnterpriseOps Arena π’
Multi-agent RL environment where IT, Manager, Finance and Oversight agents collaborate to manage a simulated enterprise under partial observability, schema drift, and SLA pressure.
Quick Links
- π HuggingFace Space: https://huggingface.co/spaces/Anurag137/enterprise-ops-arena
- π Colab Notebook: https://github.com/anuragverma025/Meta-Hackathon/blob/main/enterprise_ops/train/colab_notebook.ipynb
- π Blog Post: https://github.com/anuragverma025/Meta-Hackathon/blob/main/BLOG.md
- π» GitHub: https://github.com/anuragverma025/Meta-Hackathon
The Problem
Enterprise AI agents fail because they work in silos. The IT agent resolving a critical server ticket does not know the Finance agent just blocked the budget it needs. The Manager does not know which tickets are about to breach SLA. No coordination = cascading failures.
We built an RL environment that trains LLM agents to coordinate across departments β developing theory-of-mind reasoning through reinforcement learning.
The Environment
4 specialized LLM agents operate inside a simulated enterprise:
| Agent | Role | Sees |
|---|---|---|
| IT Agent | Resolves support tickets before SLA breach | Tickets + resource pool + inbox |
| Manager Agent | Allocates shared resources, coordinates tasks | All dept summaries + project tasks |
| Finance Agent | Approves budgets, blocks policy violations | Budget history + pending approvals |
| Oversight Agent | Monitors all agents, catches hallucinations | ALL tool call logs (full visibility) |
Key environment features
- Partial observability β each agent sees only its department
- 5 mock enterprise APIs β get_tickets, resolve_ticket, allocate_resource, approve_budget, get_project_status
- Schema drift β API fields mutate every 20 steps, forcing real adaptation
- 8 scenarios β difficulty 1 to 8, from simple IT tasks to full enterprise chaos
- Message bus β agents coordinate by sending structured messages
- Anti-reward-hacking β timeout, loop detection, state locks, oversight monitoring
Reward Design
4 independent reward functions (composable, hard to game):
| Function | Signal |
|---|---|
| task_completion | +10 per resolved ticket/task, verified by state diff |
| sla_adherence | +7.5 before deadline, -5 on breach |
| coordination_bonus | +6 when message leads to correct action next step |
| hallucination_penalty | -8 for calling non-existent API fields |
The Oversight Agent earns +15 for catching hallucinations, +8 for policy breaches, +5 for stale schema usage.
Training Results
Real training run β 200 steps on T4 GPU β 32 minutes
Key findings
- GRPO reward: -1.0 β +1.5 (crossed zero β model is learning)
- Curriculum: Advanced automatically from scenario_01 β scenario_03
- Train loss: -0.023
- Model: Qwen2.5-3B-Instruct, 4-bit quantized via Unsloth
- Method: GRPO via HuggingFace TRL
What the curves show
- Episode score dropped at step 110 when curriculum advanced to harder scenario β agents were challenged
- Score recovered by step 200 β agents adapted
- Curriculum difficulty staircase shows automatic advancement β no human intervention
- GRPO reward crossed from negative to positive β proof of learning
How to Run
Run the environment locally
git clone https://huggingface.co/spaces/Anurag137/enterprise-ops-arena
cd enterprise-ops-arena
pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 7860
Test the API
# Health check
curl https://anurag137-enterprise-ops-arena.hf.space/health
# View all endpoints
open https://anurag137-enterprise-ops-arena.hf.space/docs
Run training
git clone https://github.com/anuragverma025/Meta-Hackathon
cd Meta-Hackathon/enterprise_ops
pip install -e .
python -m enterprise_ops.train.main --scenario scenario_01 --steps 200
Tech Stack
| Component | Technology |
|---|---|
| Environment | OpenEnv + FastAPI + SQLite |
| Schemas | Pydantic v2 |
| Training | HuggingFace TRL + GRPO |
| Model | Qwen2.5-3B-Instruct |
| Efficiency | Unsloth 4-bit quantization |
| Deployment | HuggingFace Spaces + Docker |
| UI | Gradio mounted on FastAPI |
Themes Covered
- Theme 1 β Multi-Agent Interactions
- Theme 3.1 β World Modeling: Professional Tasks
Bonus prizes targeted
- Fleet AI β Scalable Oversight (OversightAgent)
- Halluminate β Multi-Actor Environments
- Scale AI β Sales/PM/IT enterprise workflows
- Scaler AI Labs β Multi-app enterprise RL
- Patronus AI β Schema drift + dynamic contracts

