Anurag137's picture
Update README.md
cbd6773 verified
|
raw
history blame
4.89 kB
metadata
title: EnterpriseOps Arena
emoji: 🏒
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false

EnterpriseOps Arena 🏒

Multi-agent RL environment where IT, Manager, Finance and Oversight agents collaborate to manage a simulated enterprise under partial observability, schema drift, and SLA pressure.

Quick Links


The Problem

Enterprise AI agents fail because they work in silos. The IT agent resolving a critical server ticket does not know the Finance agent just blocked the budget it needs. The Manager does not know which tickets are about to breach SLA. No coordination = cascading failures.

We built an RL environment that trains LLM agents to coordinate across departments β€” developing theory-of-mind reasoning through reinforcement learning.


The Environment

4 specialized LLM agents operate inside a simulated enterprise:

Agent Role Sees
IT Agent Resolves support tickets before SLA breach Tickets + resource pool + inbox
Manager Agent Allocates shared resources, coordinates tasks All dept summaries + project tasks
Finance Agent Approves budgets, blocks policy violations Budget history + pending approvals
Oversight Agent Monitors all agents, catches hallucinations ALL tool call logs (full visibility)

Key environment features

  • Partial observability β€” each agent sees only its department
  • 5 mock enterprise APIs β€” get_tickets, resolve_ticket, allocate_resource, approve_budget, get_project_status
  • Schema drift β€” API fields mutate every 20 steps, forcing real adaptation
  • 8 scenarios β€” difficulty 1 to 8, from simple IT tasks to full enterprise chaos
  • Message bus β€” agents coordinate by sending structured messages
  • Anti-reward-hacking β€” timeout, loop detection, state locks, oversight monitoring

Reward Design

4 independent reward functions (composable, hard to game):

Function Signal
task_completion +10 per resolved ticket/task, verified by state diff
sla_adherence +7.5 before deadline, -5 on breach
coordination_bonus +6 when message leads to correct action next step
hallucination_penalty -8 for calling non-existent API fields

The Oversight Agent earns +15 for catching hallucinations, +8 for policy breaches, +5 for stale schema usage.


Training Results

Real training run β€” 200 steps on T4 GPU β€” 32 minutes

Reward curves

Loss curve

Key findings

  • GRPO reward: -1.0 β†’ +1.5 (crossed zero β€” model is learning)
  • Curriculum: Advanced automatically from scenario_01 β†’ scenario_03
  • Train loss: -0.023
  • Model: Qwen2.5-3B-Instruct, 4-bit quantized via Unsloth
  • Method: GRPO via HuggingFace TRL

What the curves show

  • Episode score dropped at step 110 when curriculum advanced to harder scenario β€” agents were challenged
  • Score recovered by step 200 β€” agents adapted
  • Curriculum difficulty staircase shows automatic advancement β€” no human intervention
  • GRPO reward crossed from negative to positive β€” proof of learning

How to Run

Run the environment locally

git clone https://huggingface.co/spaces/Anurag137/enterprise-ops-arena
cd enterprise-ops-arena
pip install -r requirements.txt
uvicorn app:app --host 0.0.0.0 --port 7860

Test the API

# Health check
curl https://anurag137-enterprise-ops-arena.hf.space/health

# View all endpoints
open https://anurag137-enterprise-ops-arena.hf.space/docs

Run training

git clone https://github.com/anuragverma025/Meta-Hackathon
cd Meta-Hackathon/enterprise_ops
pip install -e .
python -m enterprise_ops.train.main --scenario scenario_01 --steps 200

Tech Stack

Component Technology
Environment OpenEnv + FastAPI + SQLite
Schemas Pydantic v2
Training HuggingFace TRL + GRPO
Model Qwen2.5-3B-Instruct
Efficiency Unsloth 4-bit quantization
Deployment HuggingFace Spaces + Docker
UI Gradio mounted on FastAPI

Themes Covered

  • Theme 1 β€” Multi-Agent Interactions
  • Theme 3.1 β€” World Modeling: Professional Tasks

Bonus prizes targeted

  • Fleet AI β€” Scalable Oversight (OversightAgent)
  • Halluminate β€” Multi-Actor Environments
  • Scale AI β€” Sales/PM/IT enterprise workflows
  • Scaler AI Labs β€” Multi-app enterprise RL
  • Patronus AI β€” Schema drift + dynamic contracts

Project Structure