Spaces:
Running
Running
metadata
title: Python Code Review Environment Server
sdk: docker
app_port: 8000
base_path: /web
pinned: false
tags:
- openenv
OpenEnv Python Code Review Environment
Production-ready hackathon submission for OpenEnv evaluation, deterministic validator runs, and Hugging Face Docker deployment.
Architecture
root
|- inference.py # Root validator entrypoint
|- openenv.yaml # OpenEnv manifest
|- app/
| |- agents/ # Action policy and fallback strategy
| |- env/ # RL loop runner and stdout contract
| |- models/ # Inference dataclasses/config
| |- services/ # OpenAI client wrapper with retries
| `- utils/ # Formatting, task loading, log suppression
|- server/
| |- env.py # OpenEnv environment and reward shaping
| |- app.py # FastAPI/OpenEnv app, optional Gradio mount
| `- Dockerfile # Alternate Docker build path
|- Dockerfile # Root deployment Docker image
|- graders/ # Syntax, bug-fix, optimization graders
|- tasks/ # Deterministic benchmark tasks and references
|- services/ # Multi-domain analysis services
|- analyzers/ # Domain-specific analyzers
|- models/ # Lazy-loaded PyTorch scoring model
|- schemas/ # API request/response contracts
`- tests/ # Local validation coverage
Runtime flow:
inference.py
-> app.env.runner.InferenceRunner
-> env.reset(task_id=...)
-> ReviewAgent(action planning)
-> env.step_result(action)
-> strict [START]/[STEP]/[END] output
What Was Fixed
inference.pynow lives at the repo root and delegates to a strict runner underapp/env.- OpenAI usage is limited to the official Python client:
client = OpenAI(base_url=API_BASE_URL, api_key=provider_token). - Defaulted env vars are enforced for
API_BASE_URLandMODEL_NAME; the runtime now selectsHF_TOKENfor the Hugging Face router andOPENAI_API_KEYfor direct OpenAI usage. - Output now matches the required single-line contract exactly and always emits
[END], including failure paths. - The RL loop now uses
reset()plusstep_result()in a properwhile not doneloop. - Step errors now surface through
last_action_errorand are printed in[STEP]. - Reward shaping is now dynamic in the OpenEnv environment: code quality, test progress, runtime progress, error removal, regressions, and completion are all part of the reward.
- The API-side reward service is no longer a static weighted sum and now exposes quality, error-reduction, and completion signals.
- The Docker image now builds from the repo root, caches dependency installation more effectively, and runs
server.app:appdirectly on port8000. - Server startup is lighter: the PyTorch analyzer is lazy-loaded and the Gradio demo is disabled by default.
Local Setup
Install dev dependencies:
pip install -e .[dev]
Run the test suite:
pytest -q
Run the OpenEnv server locally:
python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
Optional demo UI:
set ENABLE_GRADIO_DEMO=true
set ENABLE_WEB_INTERFACE=true
python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
Inference Contract
Required environment variables:
API_BASE_URLDefault:https://router.huggingface.co/v1MODEL_NAMEDefault:Qwen/Qwen2.5-3B-InstructHF_TOKENRequired forhttps://router.huggingface.co/v1OPENAI_API_KEYRequired forhttps://api.openai.com/v1
Example:
set API_BASE_URL=https://router.huggingface.co/v1
set MODEL_NAME=Qwen/Qwen2.5-3B-Instruct
set HF_TOKEN=hf_xxx
python inference.py
set API_BASE_URL=https://api.openai.com/v1
set MODEL_NAME=gpt-4.1-mini
set OPENAI_API_KEY=sk-xxx
python inference.py
Expected stdout shape:
[START] task=syntax_fix_invoice_totals env=python_code_review_env model=Qwen/Qwen2.5-3B-Instruct
[STEP] step=1 action=run_tests reward=0.12 done=false error=null
[STEP] step=2 action=edit_code reward=0.96 done=false error=null
[STEP] step=3 action=run_tests reward=0.99 done=false error=null
[STEP] step=4 action=submit_solution reward=0.99 done=true error=null
[END] success=true steps=4 rewards=0.12,0.96,0.99,0.99
Docker
Build from the project root:
docker build -t openenv-python-code-review-env .
Run locally:
docker run --rm -p 8000:8000 ^
-e API_BASE_URL=https://router.huggingface.co/v1 ^
-e MODEL_NAME=Qwen/Qwen2.5-3B-Instruct ^
-e HF_TOKEN=hf_xxx ^
openenv-python-code-review-env
Container behavior:
- Base image:
python:3.11-slim-bookworm - Build context: project root
- Runtime image installs the minimal API dependency set by default; Streamlit, PyTorch, and transformers stay out of the container, while Gradio is only used if the demo env flags are enabled.
- Healthcheck:
GET /health - Default entrypoint:
uvicorn server.app:app --host 0.0.0.0 --port 8000
Hugging Face Spaces
Recommended deployment steps:
- Create a Docker Space.
- Push this repository as-is.
- Let Spaces build from the root
Dockerfile. - Set Space secrets:
HF_TOKEN - Set Space variables as needed:
API_BASE_URL,MODEL_NAME,ENABLE_GRADIO_DEMO=falseENABLE_WEB_INTERFACE=falseis also supported for OpenEnv-managed deploys. - Confirm the app listens on port
8000. - Smoke-test:
/health/reset/step
Performance Notes
- Max concurrent environments default to
2, aligned with a2 vCPU / 8 GB RAMtarget. - The analyzer model is lazy-loaded instead of being created at startup.
- The inference runner relies on short prompts, low token budgets, and limited retries.
- The policy uses deterministic reference-code fallback instead of expensive iterative code generation.
- Public validation is preferred before final submission to avoid wasted hidden-eval steps.
Known Limitations
- If
HF_TOKENis absent, inference still completes with deterministic fallback actions, but LLM guidance is skipped. - The benchmark tasks are deterministic and intentionally small; this is good for validator stability but not a full training benchmark.
- Gradio remains optional and is disabled by default to keep deployment lighter.