Spaces:

Anurag137
/

enterprise-ops-arena

Running

App Files Files Community

enterprise-ops-arena / README.md

Anurag137

Update README.md

cbd6773 verified about 2 months ago

4.89 kB

title: EnterpriseOps Arena
emoji: 🏢
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false

EnterpriseOps Arena 🏢

Multi-agent RL environment where IT, Manager, Finance and Oversight agents collaborate to manage a simulated enterprise under partial observability, schema drift, and SLA pressure.

Quick Links

🚀 HuggingFace Space: https://huggingface.co/spaces/Anurag137/enterprise-ops-arena
📓 Colab Notebook: https://github.com/anuragverma025/Meta-Hackathon/blob/main/enterprise_ops/train/colab_notebook.ipynb
📝 Blog Post: https://github.com/anuragverma025/Meta-Hackathon/blob/main/BLOG.md
💻 GitHub: https://github.com/anuragverma025/Meta-Hackathon

The Problem

Enterprise AI agents fail because they work in silos. The IT agent resolving a critical server ticket does not know the Finance agent just blocked the budget it needs. The Manager does not know which tickets are about to breach SLA. No coordination = cascading failures.

We built an RL environment that trains LLM agents to coordinate across departments — developing theory-of-mind reasoning through reinforcement learning.

The Environment

4 specialized LLM agents operate inside a simulated enterprise:

Agent	Role	Sees
IT Agent	Resolves support tickets before SLA breach	Tickets + resource pool + inbox
Manager Agent	Allocates shared resources, coordinates tasks	All dept summaries + project tasks
Finance Agent	Approves budgets, blocks policy violations	Budget history + pending approvals
Oversight Agent	Monitors all agents, catches hallucinations	ALL tool call logs (full visibility)

Key environment features

Partial observability — each agent sees only its department
5 mock enterprise APIs — get_tickets, resolve_ticket, allocate_resource, approve_budget, get_project_status
Schema drift — API fields mutate every 20 steps, forcing real adaptation
8 scenarios — difficulty 1 to 8, from simple IT tasks to full enterprise chaos
Message bus — agents coordinate by sending structured messages
Anti-reward-hacking — timeout, loop detection, state locks, oversight monitoring

Reward Design

4 independent reward functions (composable, hard to game):

Function	Signal
task_completion	+10 per resolved ticket/task, verified by state diff
sla_adherence	+7.5 before deadline, -5 on breach
coordination_bonus	+6 when message leads to correct action next step
hallucination_penalty	-8 for calling non-existent API fields

The Oversight Agent earns +15 for catching hallucinations, +8 for policy breaches, +5 for stale schema usage.

Training Results

Real training run — 200 steps on T4 GPU — 32 minutes

Key findings

GRPO reward: -1.0 → +1.5 (crossed zero — model is learning)
Curriculum: Advanced automatically from scenario_01 → scenario_03
Train loss: -0.023
Model: Qwen2.5-3B-Instruct, 4-bit quantized via Unsloth
Method: GRPO via HuggingFace TRL

What the curves show

Episode score dropped at step 110 when curriculum advanced to harder scenario — agents were challenged
Score recovered by step 200 — agents adapted
Curriculum difficulty staircase shows automatic advancement — no human intervention
GRPO reward crossed from negative to positive — proof of learning

How to Run

Run the environment locally

git clone https://huggingface.co/spaces/Anurag137/enterprise-ops-arena
cd enterprise-ops-arena
pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 7860

Test the API

# Health check
curl https://anurag137-enterprise-ops-arena.hf.space/health

# View all endpoints
open https://anurag137-enterprise-ops-arena.hf.space/docs

Run training

git clone https://github.com/anuragverma025/Meta-Hackathon
cd Meta-Hackathon/enterprise_ops
pip install -e .
python -m enterprise_ops.train.main --scenario scenario_01 --steps 200

Tech Stack

Component	Technology
Environment	OpenEnv + FastAPI + SQLite
Schemas	Pydantic v2
Training	HuggingFace TRL + GRPO
Model	Qwen2.5-3B-Instruct
Efficiency	Unsloth 4-bit quantization
Deployment	HuggingFace Spaces + Docker
UI	Gradio mounted on FastAPI

Themes Covered

Theme 1 — Multi-Agent Interactions
Theme 3.1 — World Modeling: Professional Tasks

Bonus prizes targeted

Fleet AI — Scalable Oversight (OversightAgent)
Halluminate — Multi-Actor Environments
Scale AI — Sales/PM/IT enterprise workflows
Scaler AI Labs — Multi-app enterprise RL
Patronus AI — Schema drift + dynamic contracts