Spaces:
Sleeping
Sleeping
File size: 6,437 Bytes
019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c 019e7db 989722c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 | ---
title: Python Code Review Environment Server
sdk: docker
app_port: 8000
base_path: /web
pinned: false
tags:
- openenv
---
# OpenEnv Python Code Review Environment
Production-ready hackathon submission for OpenEnv evaluation, deterministic validator runs, and Hugging Face Docker deployment.
## Architecture
```text
root
|- inference.py # Root validator entrypoint
|- openenv.yaml # OpenEnv manifest
|- app/
| |- agents/ # Action policy and fallback strategy
| |- env/ # RL loop runner and stdout contract
| |- models/ # Inference dataclasses/config
| |- services/ # OpenAI client wrapper with retries
| `- utils/ # Formatting, task loading, log suppression
|- server/
| |- env.py # OpenEnv environment and reward shaping
| |- app.py # FastAPI/OpenEnv app, optional Gradio mount
| `- Dockerfile # Alternate Docker build path
|- Dockerfile # Root deployment Docker image
|- graders/ # Syntax, bug-fix, optimization graders
|- tasks/ # Deterministic benchmark tasks and references
|- services/ # Multi-domain analysis services
|- analyzers/ # Domain-specific analyzers
|- models/ # Lazy-loaded PyTorch scoring model
|- schemas/ # API request/response contracts
`- tests/ # Local validation coverage
```
Runtime flow:
```text
inference.py
-> app.env.runner.InferenceRunner
-> env.reset(task_id=...)
-> ReviewAgent(action planning)
-> env.step_result(action)
-> strict [START]/[STEP]/[END] output
```
## What Was Fixed
- `inference.py` now lives at the repo root and delegates to a strict runner under `app/env`.
- OpenAI usage is limited to the official Python client:
`client = OpenAI(base_url=API_BASE_URL, api_key=provider_token)`.
- Defaulted env vars are enforced for `API_BASE_URL` and `MODEL_NAME`; the runtime now selects `HF_TOKEN` for the Hugging Face router and `OPENAI_API_KEY` for direct OpenAI usage.
- Output now matches the required single-line contract exactly and always emits `[END]`, including failure paths.
- The RL loop now uses `reset()` plus `step_result()` in a proper `while not done` loop.
- Step errors now surface through `last_action_error` and are printed in `[STEP]`.
- Reward shaping is now dynamic in the OpenEnv environment:
code quality, test progress, runtime progress, error removal, regressions, and completion are all part of the reward.
- The API-side reward service is no longer a static weighted sum and now exposes quality, error-reduction, and completion signals.
- The Docker image now builds from the repo root, caches dependency installation more effectively, and runs `server.app:app` directly on port `8000`.
- Server startup is lighter:
the PyTorch analyzer is lazy-loaded and the Gradio demo is disabled by default.
## Local Setup
Install dev dependencies:
```bash
pip install -e .[dev]
```
Run the test suite:
```bash
pytest -q
```
Run the OpenEnv server locally:
```bash
python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
```
Optional demo UI:
```bash
set ENABLE_GRADIO_DEMO=true
set ENABLE_WEB_INTERFACE=true
python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
```
## Inference Contract
Required environment variables:
- `API_BASE_URL`
Default: `https://router.huggingface.co/v1`
- `MODEL_NAME`
Default: `Qwen/Qwen2.5-3B-Instruct`
- `HF_TOKEN`
Required for `https://router.huggingface.co/v1`
- `OPENAI_API_KEY`
Required for `https://api.openai.com/v1`
Example:
```bash
set API_BASE_URL=https://router.huggingface.co/v1
set MODEL_NAME=Qwen/Qwen2.5-3B-Instruct
set HF_TOKEN=hf_xxx
python inference.py
```
```bash
set API_BASE_URL=https://api.openai.com/v1
set MODEL_NAME=gpt-4.1-mini
set OPENAI_API_KEY=sk-xxx
python inference.py
```
Expected stdout shape:
```text
[START] task=syntax_fix_invoice_totals env=python_code_review_env model=Qwen/Qwen2.5-3B-Instruct
[STEP] step=1 action=run_tests reward=0.12 done=false error=null
[STEP] step=2 action=edit_code reward=0.96 done=false error=null
[STEP] step=3 action=run_tests reward=0.99 done=false error=null
[STEP] step=4 action=submit_solution reward=0.99 done=true error=null
[END] success=true steps=4 rewards=0.12,0.96,0.99,0.99
```
## Docker
Build from the project root:
```bash
docker build -t openenv-python-code-review-env .
```
Run locally:
```bash
docker run --rm -p 8000:8000 ^
-e API_BASE_URL=https://router.huggingface.co/v1 ^
-e MODEL_NAME=Qwen/Qwen2.5-3B-Instruct ^
-e HF_TOKEN=hf_xxx ^
openenv-python-code-review-env
```
Container behavior:
- Base image: `python:3.11-slim-bookworm`
- Build context: project root
- Runtime image installs the minimal API dependency set by default; Streamlit, PyTorch, and transformers stay out of the container, while Gradio is only used if the demo env flags are enabled.
- Healthcheck: `GET /health`
- Default entrypoint: `uvicorn server.app:app --host 0.0.0.0 --port 8000`
## Hugging Face Spaces
Recommended deployment steps:
1. Create a Docker Space.
2. Push this repository as-is.
3. Let Spaces build from the root `Dockerfile`.
4. Set Space secrets:
`HF_TOKEN`
5. Set Space variables as needed:
`API_BASE_URL`, `MODEL_NAME`, `ENABLE_GRADIO_DEMO=false`
`ENABLE_WEB_INTERFACE=false` is also supported for OpenEnv-managed deploys.
6. Confirm the app listens on port `8000`.
7. Smoke-test:
`/health`
`/reset`
`/step`
## Performance Notes
- Max concurrent environments default to `2`, aligned with a `2 vCPU / 8 GB RAM` target.
- The analyzer model is lazy-loaded instead of being created at startup.
- The inference runner relies on short prompts, low token budgets, and limited retries.
- The policy uses deterministic reference-code fallback instead of expensive iterative code generation.
- Public validation is preferred before final submission to avoid wasted hidden-eval steps.
## Known Limitations
- If `HF_TOKEN` is absent, inference still completes with deterministic fallback actions, but LLM guidance is skipped.
- The benchmark tasks are deterministic and intentionally small; this is good for validator stability but not a full training benchmark.
- Gradio remains optional and is disabled by default to keep deployment lighter.
|