uvpatel7271 commited on
Commit
a83cb85
·
verified ·
1 Parent(s): 4451363

Upload folder using huggingface_hub

Browse files
DEMO_SCRIPT.md ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # TorchReview Copilot Demo Script
2
+
3
+ ## 60-90 Second Walkthrough
4
+
5
+ 1. Open the Hugging Face Space and introduce TorchReview Copilot as an AI-powered code review and improvement system built with PyTorch.
6
+ 2. Point to the problem statement: manual code review is slow, inconsistent, and hard to scale.
7
+ 3. Select the `Fix the invoice total syntax regression` example to show the app loading a broken code sample together with the context window.
8
+ 4. Highlight the **Live Triage Radar**, the ML quality score, and the RL-ready reward score.
9
+ 5. Explain that the PyTorch layer uses CodeBERTa embeddings to compare the input against known code-quality patterns from the OpenEnv task catalog.
10
+ 6. Scroll to the three-step improvement plan and call out the progression: syntax and bug fixes, edge cases, then scalability.
11
+ 7. Switch to the performance example to show the confidence profile and reward changing for a different class of issue.
12
+ 8. Close by noting that OpenEnv still powers deterministic validation under the hood, so the demo remains grounded in measurable task outcomes.
Dockerfile CHANGED
@@ -6,9 +6,16 @@ ENV PYTHONDONTWRITEBYTECODE=1 \
6
 
7
  WORKDIR /app
8
 
9
- COPY pyproject.toml README.md openenv.yaml __init__.py client.py compat.py models.py inference.py /app/
 
 
 
 
 
10
  COPY server /app/server
 
11
  COPY tasks /app/tasks
 
12
  COPY graders /app/graders
13
 
14
  RUN python -m pip install --upgrade pip && \
@@ -17,7 +24,7 @@ RUN python -m pip install --upgrade pip && \
17
  EXPOSE 8000
18
 
19
  HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
20
- CMD python -c "import urllib.request; urllib.request.urlopen('http://127.0.0.1:8000/health', timeout=3).read()"
21
 
22
  ENV ENABLE_WEB_INTERFACE=true
23
- CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]
 
6
 
7
  WORKDIR /app
8
 
9
+ COPY pyproject.toml README.md DEMO_SCRIPT.md openenv.yaml __init__.py client.py compat.py openenv_models.py inference.py triage.py triage_catalog.py triage_models.py launch.py /app/
10
+ COPY api /app/api
11
+ COPY app /app/app
12
+ COPY analyzers /app/analyzers
13
+ COPY models /app/models
14
+ COPY schemas /app/schemas
15
  COPY server /app/server
16
+ COPY services /app/services
17
  COPY tasks /app/tasks
18
+ COPY utils /app/utils
19
  COPY graders /app/graders
20
 
21
  RUN python -m pip install --upgrade pip && \
 
24
  EXPOSE 8000
25
 
26
  HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
27
+ CMD python -c "import urllib.request; urllib.request.urlopen('http://127.0.0.1:8000', timeout=3).read()"
28
 
29
  ENV ENABLE_WEB_INTERFACE=true
30
+ CMD ["python", "launch.py"]
README.md CHANGED
@@ -1,189 +1,63 @@
1
  ---
2
- title: Python Code Review Environment
 
3
  colorFrom: yellow
4
- colorTo: blue
5
  sdk: docker
6
  pinned: false
7
  app_port: 8000
8
  tags:
 
 
 
9
  - openenv
10
  - code-review
11
- - python
12
  base_path: /web
13
  ---
14
 
15
- # python_code_review_env
16
 
17
- `python_code_review_env` is a production-style OpenEnv environment that simulates a realistic Python code review workflow. An agent inspects broken code, edits it, runs tests, and submits a final solution against deterministic graders for syntax repair, bug fixing, and optimization/refactoring.
18
 
19
- ## Environment design
20
 
21
- - `Observation` includes task instructions, current code, syntax errors, public test output, action history, and remaining attempts.
22
- - `Action` is structured as `analyze_code`, `edit_code`, `run_tests`, or `submit_solution`.
23
- - `Reward` is shaped and non-binary. The environment awards syntax progress, test progress, correctness, and quality improvements while penalizing invalid actions, timeouts, regressions, and unchanged edits.
24
- - `State` exposes the internal episode snapshot through `/state`.
25
 
26
- ## Task set
27
 
28
- 1. `syntax_fix_invoice_totals` (easy)
29
- Fix a syntax regression in an invoice normalization helper.
30
- 2. `bug_fix_session_windows` (medium)
31
- Repair a session-collapsing bug using deterministic public and hidden tests.
32
- 3. `optimization_rank_active_users` (hard)
33
- Refactor a slow ranking function and earn additional score from runtime improvement plus AST/style quality.
34
 
35
- ## Action schema
 
 
36
 
37
- ```json
38
- {
39
- "action_type": "edit_code",
40
- "code": "def function(...):\n ..."
41
- }
42
- ```
43
 
44
- Supported `action_type` values:
45
 
46
- - `analyze_code`
47
- - `edit_code`
48
- - `run_tests`
49
- - `submit_solution`
50
 
51
- ## Observation schema
 
 
 
 
 
52
 
53
- ```json
54
- {
55
- "task_description": "...",
56
- "current_code": "...",
57
- "errors": "...",
58
- "test_results": "...",
59
- "history": []
60
- }
61
- ```
62
 
63
- The full observation also includes `task_id`, `difficulty`, `task_kind`, `visible_tests`, `attempts_remaining`, `score`, `last_action_status`, `reward`, `done`, and a structured `reward_details` breakdown.
64
 
65
- ## Deterministic grading
 
 
66
 
67
- - Syntax tasks use `compile()` plus hidden behavioral checks.
68
- - Bug-fix tasks use deterministic function-call cases that behave like pytest assertions.
69
- - Optimization tasks combine correctness, runtime benchmarking, and AST/style quality scoring.
70
- - Infinite loops and long-running solutions are sandboxed with subprocess timeouts and receive penalties.
71
- - All scores are clamped to `[0.0, 1.0]`.
72
 
73
- ## Run locally
74
 
75
- Install dependencies:
76
-
77
- ```bash
78
- pip install .
79
- ```
80
-
81
- Start the API server:
82
-
83
- ```bash
84
- uvicorn server.app:app --host 0.0.0.0 --port 8000
85
- ```
86
-
87
- Smoke-test the environment:
88
-
89
- ```bash
90
- curl http://localhost:8000/health
91
- curl http://localhost:8000/state
92
- ```
93
-
94
- OpenEnv validation:
95
-
96
- ```bash
97
- openenv validate
98
- ```
99
-
100
- ## Docker build
101
-
102
- The Docker image no longer depends on `ghcr.io/meta-pytorch/openenv-base:latest`, which removes the TLS handshake failure from the original build path.
103
-
104
- ```bash
105
- # Run from repo root
106
- docker build -t python-code-review-env -f server/Dockerfile .
107
- docker run --rm -p 8000:8000 python-code-review-env
108
- ```
109
-
110
- If you run the build from inside `server/`, you must point the context at the repo root:
111
-
112
- ```bash
113
- docker build -t python-code-review-env -f Dockerfile ..
114
- ```
115
-
116
- Expected health check:
117
-
118
- ```bash
119
- curl http://localhost:8000/health
120
- ```
121
-
122
- ## Hugging Face Spaces deployment
123
-
124
- 1. Create a Docker Space.
125
- 2. Push this repository content to the Space.
126
- 3. Ensure port `8000` is exposed.
127
- 4. Wait for the container to build.
128
- 5. Verify `/reset` and `/health` return `200`.
129
-
130
- The image is CPU-friendly and designed for a small Hugging Face Space such as `2 vCPU / 8 GB RAM`.
131
-
132
- ## Inference baseline
133
-
134
- `inference.py` uses an OpenAI-compatible client:
135
-
136
- ```python
137
- client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
138
- ```
139
-
140
- Supported providers include:
141
-
142
- - Gemini through an OpenAI-compatible gateway
143
- - OpenRouter
144
- - Together AI
145
- - DeepSeek-compatible OpenAI endpoints
146
-
147
- Run it with a free/open provider:
148
-
149
- ```bash
150
- set API_BASE_URL=https://openrouter.ai/api/v1
151
- set API_KEY=...
152
- set MODEL=deepseek/deepseek-chat-v3-0324:free
153
- python inference.py
154
- ```
155
-
156
- If no credentials are supplied, the script falls back to a deterministic smoke-test policy that applies the reference fix for each task so the environment can still be validated end to end.
157
-
158
- Example output:
159
-
160
- ```text
161
- Task 1 Score: 1.0
162
- Task 2 Score: 1.0
163
- Task 3 Score: 0.9
164
- Final Score: 1.0
165
- ```
166
-
167
- ## Project structure
168
 
169
  ```text
170
- python_env/
171
- ├── client.py
172
- ├── graders/
173
- │ ├── bug_fix.py
174
- │ ├── dispatch.py
175
- │ ├── optimization.py
176
- │ ├── shared.py
177
- │ └── syntax.py
178
- ├── inference.py
179
- ├── models.py
180
- ├── openenv.yaml
181
- ├── README.md
182
- ├── server/
183
- │ ├── app.py
184
- │ ├── Dockerfile
185
- │ ├── env.py
186
- │ └── python_env_environment.py
187
- └── tasks/
188
- └── catalog.py
189
- ```
 
1
  ---
2
+ title: TorchReview Copilot
3
+ emoji: 🧠
4
  colorFrom: yellow
5
+ colorTo: red
6
  sdk: docker
7
  pinned: false
8
  app_port: 8000
9
  tags:
10
+ - pytorch
11
+ - gradio
12
+ - fastapi
13
  - openenv
14
  - code-review
 
15
  base_path: /web
16
  ---
17
 
18
+ # TorchReview Copilot
19
 
20
+ TorchReview Copilot is an **AI-powered code review and improvement system using PyTorch** to analyze Python code, predict quality, generate structured improvement suggestions, and compute an RL-ready reward score.
21
 
22
+ It upgrades the original OpenEnv hackathon environment into a judge-friendly product demo: a polished Hugging Face Space on top, with the deterministic OpenEnv validation engine still preserved underneath.
23
 
24
+ **Live demo:** https://huggingface.co/spaces/uvpatel7271/final-python-env
25
+ **Repository:** https://github.com/uvpatel/final-python-env
 
 
26
 
27
+ ## Problem Statement
28
 
29
+ Engineering teams lose time during incident response and code review because broken Python snippets often arrive with noisy traces, partial test output, and unclear ownership. Before fixing anything, someone still has to answer:
 
 
 
 
 
30
 
31
+ - Is this a syntax issue, a logic bug, or a performance regression?
32
+ - How risky is the repair?
33
+ - What should be checked first?
34
 
35
+ That triage step is repetitive, error-prone, and often slows down the actual fix.
 
 
 
 
 
36
 
37
+ ## Solution
38
 
39
+ TorchReview Copilot turns code, traceback text, and a short context window into a practical code-review report:
 
 
 
40
 
41
+ - **Issue classification:** syntax, logic, or performance
42
+ - **ML quality score:** predicted code quality from PyTorch embeddings
43
+ - **Reward score:** RL-ready score from model quality, lint quality, and complexity penalty
44
+ - **Live Triage Radar:** confidence visualization for all issue classes
45
+ - **Nearest known pattern:** the closest OpenEnv task match
46
+ - **Improvement plan:** step 1 syntax/bug fixes, step 2 edge cases, step 3 scalability
47
 
48
+ ## Why PyTorch Matters
 
 
 
 
 
 
 
 
49
 
50
+ This project uses **PyTorch for real inference**, not placeholder branching:
51
 
52
+ - `transformers` + `torch` load `huggingface/CodeBERTa-small-v1`
53
+ - embeddings compare code with OpenEnv issue prototypes
54
+ - combines ML + static analysis signals
55
 
56
+ ## How It Works
 
 
 
 
57
 
58
+ `Input static checks → PyTorch embeddings → prediction → suggestions → reward`
59
 
60
+ ## Reward Formula
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
 
62
  ```text
63
+ reward = (0.5 x ML_quality_score) + (0.3 x lint_score) - (0.2 x complexity_penalty)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
__init__.py CHANGED
@@ -1,7 +1,8 @@
1
  """Public package exports for python_code_review_env."""
2
 
3
  from .client import PythonCodeReviewEnv, PythonEnv
4
- from .models import (
 
5
  PythonAction,
6
  PythonCodeReviewAction,
7
  PythonCodeReviewObservation,
@@ -9,6 +10,10 @@ from .models import (
9
  PythonObservation,
10
  PythonState,
11
  )
 
 
 
 
12
 
13
  __all__ = [
14
  "PythonAction",
@@ -19,4 +24,13 @@ __all__ = [
19
  "PythonCodeReviewState",
20
  "PythonCodeReviewEnv",
21
  "PythonEnv",
 
 
 
 
 
 
 
 
 
22
  ]
 
1
  """Public package exports for python_code_review_env."""
2
 
3
  from .client import PythonCodeReviewEnv, PythonEnv
4
+ from .models import PyTorchCodeAnalyzerModel
5
+ from .Models import (
6
  PythonAction,
7
  PythonCodeReviewAction,
8
  PythonCodeReviewObservation,
 
10
  PythonObservation,
11
  PythonState,
12
  )
13
+ from .schemas import AnalyzeCodeRequest, AnalyzeCodeResponse
14
+ from .services import AnalysisService
15
+ from .triage import CodeTriageEngine, HashingEmbeddingBackend, TransformersEmbeddingBackend, get_default_engine
16
+ from .triage_models import TriageResult
17
 
18
  __all__ = [
19
  "PythonAction",
 
24
  "PythonCodeReviewState",
25
  "PythonCodeReviewEnv",
26
  "PythonEnv",
27
+ "AnalyzeCodeRequest",
28
+ "AnalyzeCodeResponse",
29
+ "AnalysisService",
30
+ "CodeTriageEngine",
31
+ "HashingEmbeddingBackend",
32
+ "PyTorchCodeAnalyzerModel",
33
+ "TransformersEmbeddingBackend",
34
+ "TriageResult",
35
+ "get_default_engine",
36
  ]
analyzers/__init__.py ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Domain-specific analyzers for multi-domain code understanding."""
2
+
3
+ from .dsa_analyzer import analyze_dsa_code
4
+ from .ds_analyzer import analyze_data_science_code
5
+ from .ml_analyzer import analyze_ml_code
6
+ from .web_analyzer import analyze_web_code
7
+
8
+ __all__ = [
9
+ "analyze_dsa_code",
10
+ "analyze_data_science_code",
11
+ "analyze_ml_code",
12
+ "analyze_web_code",
13
+ ]
analyzers/ds_analyzer.py ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Analyzer for data-science oriented Python code."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Any, Dict
6
+
7
+ from schemas.response import AnalysisIssue, DomainAnalysis
8
+
9
+
10
+ def analyze_data_science_code(code: str, parsed: Dict[str, Any], complexity: Dict[str, Any]) -> DomainAnalysis:
11
+ """Inspect pandas and numpy code for vectorization and leakage concerns."""
12
+
13
+ issues = []
14
+ suggestions = []
15
+ score = 0.72
16
+
17
+ if "iterrows(" in code or "itertuples(" in code:
18
+ issues.append(
19
+ AnalysisIssue(
20
+ title="Row-wise dataframe iteration detected",
21
+ severity="medium",
22
+ description="Looping through dataframe rows is usually slower and less scalable than vectorized operations.",
23
+ )
24
+ )
25
+ suggestions.append("Use vectorized pandas or numpy expressions instead of row-wise iteration.")
26
+ score -= 0.18
27
+
28
+ if "inplace=True" in code:
29
+ suggestions.append("Avoid inplace mutation to keep data pipelines easier to reason about and test.")
30
+ score -= 0.05
31
+
32
+ if "fit_transform(" in code and "train_test_split" not in code:
33
+ issues.append(
34
+ AnalysisIssue(
35
+ title="Potential data leakage risk",
36
+ severity="high",
37
+ description="Feature transforms appear before an explicit train/test split.",
38
+ )
39
+ )
40
+ suggestions.append("Split train and validation data before fitting stateful preprocessing steps.")
41
+ score -= 0.2
42
+
43
+ if not suggestions:
44
+ suggestions.append("Add schema assumptions and null-handling checks for production data quality.")
45
+
46
+ return DomainAnalysis(
47
+ domain="data_science",
48
+ domain_score=max(0.05, round(score, 4)),
49
+ issues=issues,
50
+ suggestions=suggestions,
51
+ highlights={
52
+ "vectorization_risk": float("iterrows(" in code or "itertuples(" in code),
53
+ "time_complexity": complexity["time_complexity"],
54
+ "uses_pandas": float(parsed.get("uses_pandas", False)),
55
+ },
56
+ )
analyzers/dsa_analyzer.py ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Analyzer for DSA and competitive-programming style Python code."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Any, Dict
6
+
7
+ from schemas.response import AnalysisIssue, DomainAnalysis
8
+
9
+
10
+ def analyze_dsa_code(code: str, parsed: Dict[str, Any], complexity: Dict[str, Any]) -> DomainAnalysis:
11
+ """Inspect algorithmic code for brute-force patterns and efficiency risks."""
12
+
13
+ issues = []
14
+ suggestions = []
15
+ score = 0.7
16
+
17
+ if parsed.get("max_loop_depth", 0) >= 2:
18
+ issues.append(
19
+ AnalysisIssue(
20
+ title="Nested loops suggest brute-force behavior",
21
+ severity="medium",
22
+ description="The implementation scans the input multiple times, which is often avoidable in DSA problems.",
23
+ )
24
+ )
25
+ suggestions.append("Consider replacing nested scans with a hashmap, prefix table, or sorted search strategy.")
26
+ score -= 0.15
27
+
28
+ if parsed.get("uses_recursion"):
29
+ suggestions.append("Verify recursion depth and add memoization or iterative conversion if the input size can grow.")
30
+ score -= 0.05
31
+
32
+ if "sorted(" in code or ".sort(" in code:
33
+ suggestions.append("Sorting is acceptable here, but validate whether a direct O(n) pass can remove the sort.")
34
+
35
+ if not suggestions:
36
+ suggestions.append("Document the intended time complexity and add edge-case checks for empty input and duplicates.")
37
+
38
+ return DomainAnalysis(
39
+ domain="dsa",
40
+ domain_score=max(0.05, round(score, 4)),
41
+ issues=issues,
42
+ suggestions=suggestions,
43
+ highlights={
44
+ "time_complexity": complexity["time_complexity"],
45
+ "space_complexity": complexity["space_complexity"],
46
+ "max_loop_depth": float(parsed.get("max_loop_depth", 0)),
47
+ },
48
+ )
analyzers/ml_analyzer.py ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Analyzer for machine-learning and deep-learning code."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Any, Dict
6
+
7
+ from schemas.response import AnalysisIssue, DomainAnalysis
8
+
9
+
10
+ def analyze_ml_code(code: str, parsed: Dict[str, Any], complexity: Dict[str, Any]) -> DomainAnalysis:
11
+ """Inspect training and inference logic for common ML / DL mistakes."""
12
+
13
+ issues = []
14
+ suggestions = []
15
+ score = 0.74
16
+
17
+ if "torch" in code and "model.eval()" not in code and "predict" in code.lower():
18
+ issues.append(
19
+ AnalysisIssue(
20
+ title="Inference path may be missing eval mode",
21
+ severity="high",
22
+ description="Inference code should place the model in eval mode before prediction.",
23
+ )
24
+ )
25
+ suggestions.append("Call model.eval() before inference to disable training-time behavior such as dropout.")
26
+ score -= 0.18
27
+
28
+ if "torch" in code and "no_grad" not in code and "predict" in code.lower():
29
+ suggestions.append("Wrap inference in torch.no_grad() to reduce memory usage and avoid unnecessary gradient tracking.")
30
+ score -= 0.12
31
+
32
+ if parsed.get("calls_backward") and not parsed.get("calls_optimizer_step"):
33
+ issues.append(
34
+ AnalysisIssue(
35
+ title="Backward pass without optimizer step",
36
+ severity="medium",
37
+ description="Gradients are computed, but the optimizer step is not obvious in the snippet.",
38
+ )
39
+ )
40
+ suggestions.append("Ensure optimizer.step() and optimizer.zero_grad() are placed correctly in the training loop.")
41
+ score -= 0.12
42
+
43
+ if "CrossEntropyLoss" in code and "softmax(" in code:
44
+ suggestions.append("CrossEntropyLoss expects raw logits; remove the explicit softmax before the loss when possible.")
45
+ score -= 0.05
46
+
47
+ if not suggestions:
48
+ suggestions.append("Add explicit train/eval mode transitions and log validation metrics during training.")
49
+
50
+ return DomainAnalysis(
51
+ domain="ml_dl",
52
+ domain_score=max(0.05, round(score, 4)),
53
+ issues=issues,
54
+ suggestions=suggestions,
55
+ highlights={
56
+ "uses_torch": float(parsed.get("uses_torch", False)),
57
+ "has_eval_mode": float("model.eval()" in code),
58
+ "has_no_grad": float("no_grad" in code),
59
+ "time_complexity": complexity["time_complexity"],
60
+ },
61
+ )
analyzers/web_analyzer.py ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Analyzer for FastAPI and backend web-service code."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Any, Dict
6
+
7
+ from schemas.response import AnalysisIssue, DomainAnalysis
8
+
9
+
10
+ def analyze_web_code(code: str, parsed: Dict[str, Any], complexity: Dict[str, Any]) -> DomainAnalysis:
11
+ """Inspect API code for validation, routing, and backend safety concerns."""
12
+
13
+ issues = []
14
+ suggestions = []
15
+ score = 0.76
16
+
17
+ route_decorators = set(parsed.get("route_decorators", []))
18
+ if route_decorators and not parsed.get("uses_pydantic"):
19
+ issues.append(
20
+ AnalysisIssue(
21
+ title="Request validation model is missing",
22
+ severity="high",
23
+ description="Route handlers appear present, but no obvious Pydantic validation layer was detected.",
24
+ )
25
+ )
26
+ suggestions.append("Add Pydantic request and response models for strict validation and type-safe contracts.")
27
+ score -= 0.2
28
+
29
+ if {"get", "post", "put", "delete"} & route_decorators and "async def" not in code:
30
+ suggestions.append("Prefer async FastAPI endpoints when the route performs I/O or awaits downstream services.")
31
+ score -= 0.08
32
+
33
+ if "request.json()" in code or "request.body()" in code:
34
+ suggestions.append("Validate raw request payloads before use; avoid trusting unchecked JSON input.")
35
+ score -= 0.08
36
+
37
+ if not suggestions:
38
+ suggestions.append("Add domain-specific response models and centralize dependency injection for cleaner API structure.")
39
+
40
+ return DomainAnalysis(
41
+ domain="web",
42
+ domain_score=max(0.05, round(score, 4)),
43
+ issues=issues,
44
+ suggestions=suggestions,
45
+ highlights={
46
+ "route_count": float(len(route_decorators)),
47
+ "uses_validation": float(parsed.get("uses_pydantic", False)),
48
+ "time_complexity": complexity["time_complexity"],
49
+ },
50
+ )
api/__init__.py ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ """FastAPI backend package for the multi-domain analyzer."""
2
+
3
+ from .main import app
4
+
5
+ __all__ = ["app"]
api/main.py ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """FastAPI backend for the multi-domain AI code analyzer."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from fastapi import FastAPI
6
+
7
+ from schemas.request import AnalyzeCodeRequest
8
+ from schemas.response import AnalyzeCodeResponse
9
+ from services.analysis_service import AnalysisService
10
+
11
+
12
+ app = FastAPI(title="Multi-Domain AI Code Analyzer", version="2.0.0")
13
+ analysis_service = AnalysisService()
14
+
15
+
16
+ @app.get("/health")
17
+ def health() -> dict[str, str]:
18
+ """Return a simple health payload for deployments and smoke tests."""
19
+
20
+ return {"status": "ok"}
21
+
22
+
23
+ @app.post("/analyze", response_model=AnalyzeCodeResponse)
24
+ def analyze_code(payload: AnalyzeCodeRequest) -> AnalyzeCodeResponse:
25
+ """Analyze code across supported domains and return structured results."""
26
+
27
+ return analysis_service.analyze(payload)
app/__init__.py ADDED
@@ -0,0 +1 @@
 
 
1
+ """Streamlit UI package for the multi-domain analyzer."""
app/examples.py ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Example snippets for each supported analysis domain."""
2
+
3
+ from __future__ import annotations
4
+
5
+
6
+ EXAMPLES = {
7
+ "DSA": {
8
+ "domain_hint": "dsa",
9
+ "context_window": "Competitive-programming helper for pair lookup on large arrays.",
10
+ "traceback_text": "",
11
+ "code": """def two_sum(nums, target):\n for i in range(len(nums)):\n for j in range(i + 1, len(nums)):\n if nums[i] + nums[j] == target:\n return [i, j]\n return []\n""",
12
+ },
13
+ "Data Science": {
14
+ "domain_hint": "data_science",
15
+ "context_window": "Feature engineering step in a churn-prediction notebook.",
16
+ "traceback_text": "",
17
+ "code": """import pandas as pd\n\ndef encode_features(df):\n values = []\n for _, row in df.iterrows():\n values.append(row['age'] * row['sessions'])\n df['score'] = values\n return df\n""",
18
+ },
19
+ "ML / DL": {
20
+ "domain_hint": "ml_dl",
21
+ "context_window": "Inference utility for a PyTorch classifier used in a batch review job.",
22
+ "traceback_text": "",
23
+ "code": """import torch\n\nclass Predictor:\n def __init__(self, model):\n self.model = model\n\n def predict(self, batch):\n outputs = self.model(batch)\n return outputs.argmax(dim=1)\n""",
24
+ },
25
+ "Web / FastAPI": {
26
+ "domain_hint": "web",
27
+ "context_window": "Backend endpoint for creating review tasks from user-submitted payloads.",
28
+ "traceback_text": "",
29
+ "code": """from fastapi import FastAPI, Request\n\napp = FastAPI()\n\n@app.post('/tasks')\ndef create_task(request: Request):\n payload = request.json()\n return {'task': payload}\n""",
30
+ },
31
+ }
app/streamlit_app.py ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Streamlit frontend for the multi-domain analyzer platform."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import streamlit as st
6
+
7
+ from app.examples import EXAMPLES
8
+ from schemas.request import AnalyzeCodeRequest
9
+ from services.analysis_service import AnalysisService
10
+
11
+
12
+ analysis_service = AnalysisService()
13
+
14
+
15
+ def _analyze(code: str, context_window: str, traceback_text: str, domain_hint: str):
16
+ """Run the analysis service with validated request payloads."""
17
+
18
+ request = AnalyzeCodeRequest(
19
+ code=code,
20
+ context_window=context_window,
21
+ traceback_text=traceback_text,
22
+ domain_hint=domain_hint, # type: ignore[arg-type]
23
+ )
24
+ return analysis_service.analyze(request)
25
+
26
+
27
+ def main() -> None:
28
+ """Render the Streamlit UI."""
29
+
30
+ st.set_page_config(page_title="Multi-Domain AI Code Analyzer", layout="wide")
31
+ st.title("Multi-Domain AI Code Analyzer & Improvement System")
32
+ st.caption("PyTorch-powered code review across DSA, Data Science, ML/DL, and Web backend code.")
33
+
34
+ example_name = st.selectbox("Example input", list(EXAMPLES.keys()))
35
+ example = EXAMPLES[example_name]
36
+ auto_analyze = st.toggle("Real-time scoring", value=True)
37
+
38
+ left, right = st.columns([1.2, 1.0])
39
+ with left:
40
+ code = st.text_area("Code input", value=example["code"], height=420)
41
+ context_window = st.text_area("Context window", value=example["context_window"], height=100)
42
+ traceback_text = st.text_area("Optional traceback / runtime hint", value=example["traceback_text"], height=100)
43
+ domain_hint = st.selectbox("Domain hint", ["auto", "dsa", "data_science", "ml_dl", "web"], index=["auto", "dsa", "data_science", "ml_dl", "web"].index(example["domain_hint"]))
44
+ analyze_clicked = st.button("Analyze Code", type="primary")
45
+
46
+ result = None
47
+ if code and (analyze_clicked or auto_analyze):
48
+ result = _analyze(code, context_window, traceback_text, domain_hint)
49
+
50
+ with right:
51
+ if result is None:
52
+ st.info("Paste code or load an example to start analysis.")
53
+ else:
54
+ metric_cols = st.columns(4)
55
+ metric_cols[0].metric("Detected domain", result.detected_domain)
56
+ metric_cols[1].metric("ML score", f"{result.score_breakdown.ml_score:.0%}")
57
+ metric_cols[2].metric("Domain score", f"{result.score_breakdown.domain_score:.0%}")
58
+ metric_cols[3].metric("Reward", f"{result.score_breakdown.reward:.0%}")
59
+ st.bar_chart(result.domain_confidences)
60
+ st.caption(result.summary)
61
+
62
+ if result is not None:
63
+ overview_tab, suggestions_tab, domain_tab, static_tab = st.tabs(
64
+ ["Overview", "Suggestions", "Domain Detail", "Static Analysis"]
65
+ )
66
+
67
+ with overview_tab:
68
+ st.subheader("Improvement Plan")
69
+ for step in result.improvement_plan:
70
+ st.write(f"- {step}")
71
+ st.subheader("Complexity")
72
+ st.write(
73
+ {
74
+ "time_complexity": result.static_analysis.time_complexity,
75
+ "space_complexity": result.static_analysis.space_complexity,
76
+ "cyclomatic_complexity": result.static_analysis.cyclomatic_complexity,
77
+ }
78
+ )
79
+
80
+ with suggestions_tab:
81
+ st.subheader("Suggestions")
82
+ for suggestion in result.domain_analysis.suggestions:
83
+ st.write(f"- {suggestion}")
84
+ if result.domain_analysis.issues:
85
+ st.subheader("Issues")
86
+ for issue in result.domain_analysis.issues:
87
+ st.write(f"- [{issue.severity}] {issue.title}: {issue.description}")
88
+
89
+ with domain_tab:
90
+ st.subheader("Domain Highlights")
91
+ st.json(result.domain_analysis.highlights)
92
+ st.write(f"Domain score: {result.domain_analysis.domain_score:.0%}")
93
+
94
+ with static_tab:
95
+ st.subheader("Static Analysis")
96
+ st.json(result.static_analysis.model_dump())
97
+
98
+
99
+ if __name__ == "__main__":
100
+ main()
client.py CHANGED
@@ -7,7 +7,7 @@ from typing import Dict
7
  from openenv.core import EnvClient
8
  from openenv.core.client_types import StepResult
9
 
10
- from .models import (
11
  PythonCodeReviewAction,
12
  PythonCodeReviewObservation,
13
  PythonCodeReviewState,
 
7
  from openenv.core import EnvClient
8
  from openenv.core.client_types import StepResult
9
 
10
+ from .Models import (
11
  PythonCodeReviewAction,
12
  PythonCodeReviewObservation,
13
  PythonCodeReviewState,
graders/bug_fix.py CHANGED
@@ -3,10 +3,10 @@
3
  from __future__ import annotations
4
 
5
  try:
6
- from ..models import TaskGrade
7
  from ..tasks.catalog import ReviewTask
8
  except ImportError:
9
- from models import TaskGrade
10
  from tasks.catalog import ReviewTask
11
 
12
  from .shared import (
 
3
  from __future__ import annotations
4
 
5
  try:
6
+ from ..Models import TaskGrade
7
  from ..tasks.catalog import ReviewTask
8
  except ImportError:
9
+ from Models import TaskGrade
10
  from tasks.catalog import ReviewTask
11
 
12
  from .shared import (
graders/dispatch.py CHANGED
@@ -3,10 +3,10 @@
3
  from __future__ import annotations
4
 
5
  try:
6
- from ..models import TaskGrade
7
  from ..tasks.catalog import ReviewTask
8
  except ImportError:
9
- from models import TaskGrade
10
  from tasks.catalog import ReviewTask
11
 
12
  from .bug_fix import grade_bug_fix_task
 
3
  from __future__ import annotations
4
 
5
  try:
6
+ from ..Models import TaskGrade
7
  from ..tasks.catalog import ReviewTask
8
  except ImportError:
9
+ from Models import TaskGrade
10
  from tasks.catalog import ReviewTask
11
 
12
  from .bug_fix import grade_bug_fix_task
graders/optimization.py CHANGED
@@ -3,10 +3,10 @@
3
  from __future__ import annotations
4
 
5
  try:
6
- from ..models import TaskGrade
7
  from ..tasks.catalog import ReviewTask
8
  except ImportError:
9
- from models import TaskGrade
10
  from tasks.catalog import ReviewTask
11
 
12
  from .shared import (
 
3
  from __future__ import annotations
4
 
5
  try:
6
+ from ..Models import TaskGrade
7
  from ..tasks.catalog import ReviewTask
8
  except ImportError:
9
+ from Models import TaskGrade
10
  from tasks.catalog import ReviewTask
11
 
12
  from .shared import (
graders/shared.py CHANGED
@@ -11,10 +11,10 @@ import traceback
11
  from typing import Any, Callable, Dict, List
12
 
13
  try:
14
- from ..models import TaskGrade
15
  from ..tasks.catalog import CallCase, ReviewTask
16
  except ImportError:
17
- from models import TaskGrade
18
  from tasks.catalog import CallCase, ReviewTask
19
 
20
 
 
11
  from typing import Any, Callable, Dict, List
12
 
13
  try:
14
+ from ..Models import TaskGrade
15
  from ..tasks.catalog import CallCase, ReviewTask
16
  except ImportError:
17
+ from Models import TaskGrade
18
  from tasks.catalog import CallCase, ReviewTask
19
 
20
 
graders/syntax.py CHANGED
@@ -3,10 +3,10 @@
3
  from __future__ import annotations
4
 
5
  try:
6
- from ..models import TaskGrade
7
  from ..tasks.catalog import ReviewTask
8
  except ImportError:
9
- from models import TaskGrade
10
  from tasks.catalog import ReviewTask
11
 
12
  from .shared import (
 
3
  from __future__ import annotations
4
 
5
  try:
6
+ from ..Models import TaskGrade
7
  from ..tasks.catalog import ReviewTask
8
  except ImportError:
9
+ from Models import TaskGrade
10
  from tasks.catalog import ReviewTask
11
 
12
  from .shared import (
inference.py CHANGED
@@ -28,7 +28,7 @@ except Exception:
28
  PythonCodeReviewEnvironment = None # type: ignore[assignment]
29
 
30
  try:
31
- from models import PythonCodeReviewAction
32
  except Exception:
33
  PythonCodeReviewAction = None # type: ignore[assignment]
34
 
 
28
  PythonCodeReviewEnvironment = None # type: ignore[assignment]
29
 
30
  try:
31
+ from Models import PythonCodeReviewAction
32
  except Exception:
33
  PythonCodeReviewAction = None # type: ignore[assignment]
34
 
launch.py ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Launch the FastAPI backend and Streamlit UI in one Docker container."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import subprocess
6
+ import sys
7
+
8
+
9
+ def main() -> int:
10
+ """Start the API backend in the background and keep Streamlit in the foreground."""
11
+
12
+ api_process = subprocess.Popen(
13
+ ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8001"],
14
+ )
15
+ try:
16
+ return subprocess.call(
17
+ [
18
+ "streamlit",
19
+ "run",
20
+ "app/streamlit_app.py",
21
+ "--server.port",
22
+ "8000",
23
+ "--server.address",
24
+ "0.0.0.0",
25
+ "--server.headless",
26
+ "true",
27
+ ]
28
+ )
29
+ finally:
30
+ api_process.terminate()
31
+ api_process.wait(timeout=10)
32
+
33
+
34
+ if __name__ == "__main__":
35
+ sys.exit(main())
models/__init__.py ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ """PyTorch-backed model wrappers for the analyzer platform."""
2
+
3
+ from .pytorch_model import PyTorchCodeAnalyzerModel
4
+
5
+ __all__ = ["PyTorchCodeAnalyzerModel"]
models/pytorch_model.py ADDED
@@ -0,0 +1,149 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """PyTorch + transformers model wrapper for multi-domain code scoring."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import hashlib
6
+ from typing import Dict, List, Sequence
7
+
8
+ import torch
9
+ import torch.nn.functional as F
10
+
11
+ try:
12
+ from transformers import AutoModel, AutoTokenizer
13
+ except Exception:
14
+ AutoModel = None # type: ignore[assignment]
15
+ AutoTokenizer = None # type: ignore[assignment]
16
+
17
+
18
+ DOMAIN_PROTOTYPES: Dict[str, List[str]] = {
19
+ "dsa": [
20
+ "Binary search, hashmap optimization, recursion, dynamic programming, arrays, trees, graphs, stack, queue, complexity.",
21
+ "Competitive programming algorithm with loops, memoization, prefix sums, and asymptotic analysis.",
22
+ ],
23
+ "data_science": [
24
+ "Pandas dataframe transformation, numpy vectorization, feature leakage, train test split, iterrows misuse.",
25
+ "Data cleaning pipeline using pandas, numpy, aggregation, joins, and vectorized operations.",
26
+ ],
27
+ "ml_dl": [
28
+ "PyTorch model, training loop, optimizer, backward pass, eval mode, no_grad, loss function, dataloader.",
29
+ "Machine learning inference and training code with torch, sklearn, tensors, gradients, and model checkpoints.",
30
+ ],
31
+ "web": [
32
+ "FastAPI endpoint, request validation, Pydantic models, async routes, API security, backend service design.",
33
+ "REST API backend with routers, dependency injection, input validation, serialization, and error handling.",
34
+ ],
35
+ "general": [
36
+ "General Python utility code with readable structure, typing, tests, and maintainable abstractions.",
37
+ ],
38
+ }
39
+
40
+ QUALITY_ANCHORS: Dict[str, List[str]] = {
41
+ "high": [
42
+ "Readable typed Python code with validation, efficient algorithms, vectorized operations, safe inference, and clean API boundaries.",
43
+ "Production-ready code with small functions, docstrings, low complexity, and clear error handling.",
44
+ ],
45
+ "low": [
46
+ "Brute-force nested loops, missing validation, unsafe input handling, missing eval mode, missing no_grad, and code smells.",
47
+ "Hard to maintain code with high complexity, repeated scans, mutable side effects, and unclear structure.",
48
+ ],
49
+ }
50
+
51
+
52
+ class _HashEmbeddingBackend:
53
+ """Torch-native fallback when pretrained weights cannot be loaded."""
54
+
55
+ def __init__(self, dimensions: int = 128) -> None:
56
+ self.dimensions = dimensions
57
+ self.model_id = "hashed-token-fallback"
58
+ self.backend_name = "hashed-token-fallback"
59
+ self.notes = ["Using hashed embeddings because pretrained transformer weights are unavailable."]
60
+
61
+ def embed_texts(self, texts: Sequence[str]) -> torch.Tensor:
62
+ matrix = torch.zeros((len(texts), self.dimensions), dtype=torch.float32)
63
+ for row_index, text in enumerate(texts):
64
+ tokens = text.lower().split()[:512]
65
+ if not tokens:
66
+ matrix[row_index, 0] = 1.0
67
+ continue
68
+ for token in tokens:
69
+ digest = hashlib.md5(token.encode("utf-8")).hexdigest()
70
+ bucket = int(digest[:8], 16) % self.dimensions
71
+ sign = -1.0 if int(digest[8:10], 16) % 2 else 1.0
72
+ matrix[row_index, bucket] += sign
73
+ return F.normalize(matrix + 1e-6, dim=1)
74
+
75
+
76
+ class PyTorchCodeAnalyzerModel:
77
+ """Score code using pretrained transformer embeddings plus prototype similarity."""
78
+
79
+ def __init__(self, model_id: str = "huggingface/CodeBERTa-small-v1") -> None:
80
+ self.model_id = model_id
81
+ self.backend_name = model_id
82
+ self.notes: List[str] = []
83
+ self._tokenizer = None
84
+ self._model = None
85
+ self._fallback = _HashEmbeddingBackend()
86
+ self._prototype_cache: Dict[str, torch.Tensor] = {}
87
+
88
+ def _ensure_loaded(self) -> None:
89
+ if self._model is not None or self.notes:
90
+ return
91
+ if AutoTokenizer is None or AutoModel is None:
92
+ self.backend_name = self._fallback.backend_name
93
+ self.notes = list(self._fallback.notes)
94
+ return
95
+ try:
96
+ self._tokenizer = AutoTokenizer.from_pretrained(self.model_id)
97
+ self._model = AutoModel.from_pretrained(self.model_id)
98
+ self._model.eval()
99
+ self.notes.append(f"Loaded pretrained encoder `{self.model_id}`.")
100
+ except Exception as exc:
101
+ self.backend_name = self._fallback.backend_name
102
+ self.notes = list(self._fallback.notes) + [f"Pretrained load failed: {type(exc).__name__}: {exc}"]
103
+
104
+ def _embed_texts(self, texts: Sequence[str]) -> torch.Tensor:
105
+ self._ensure_loaded()
106
+ if self._model is None or self._tokenizer is None:
107
+ return self._fallback.embed_texts(texts)
108
+ encoded = self._tokenizer(list(texts), padding=True, truncation=True, max_length=256, return_tensors="pt")
109
+ with torch.no_grad():
110
+ outputs = self._model(**encoded)
111
+ hidden = outputs.last_hidden_state
112
+ mask = encoded["attention_mask"].unsqueeze(-1)
113
+ pooled = (hidden * mask).sum(dim=1) / mask.sum(dim=1).clamp(min=1)
114
+ return F.normalize(pooled, dim=1)
115
+
116
+ def _prototype_matrix(self, bucket: str, texts: Sequence[str]) -> torch.Tensor:
117
+ if bucket not in self._prototype_cache:
118
+ self._prototype_cache[bucket] = self._embed_texts(texts)
119
+ return self._prototype_cache[bucket]
120
+
121
+ def predict(self, code: str, context_window: str, static_summary: Dict[str, object]) -> Dict[str, object]:
122
+ """Predict domain probabilities and a model quality score."""
123
+
124
+ document = (
125
+ f"Code:\n{code.strip()[:4000]}\n\n"
126
+ f"Context:\n{context_window.strip()[:1000]}\n\n"
127
+ f"Static hints:\n{static_summary}\n"
128
+ )
129
+ candidate = self._embed_texts([document])
130
+
131
+ domain_scores: Dict[str, float] = {}
132
+ for domain, texts in DOMAIN_PROTOTYPES.items():
133
+ matrix = self._prototype_matrix(f"domain:{domain}", texts)
134
+ similarity = torch.matmul(candidate, matrix.T).max().item()
135
+ domain_scores[domain] = round((similarity + 1.0) / 2.0, 4)
136
+
137
+ high_matrix = self._prototype_matrix("quality:high", QUALITY_ANCHORS["high"])
138
+ low_matrix = self._prototype_matrix("quality:low", QUALITY_ANCHORS["low"])
139
+ high_similarity = torch.matmul(candidate, high_matrix.T).max().item()
140
+ low_similarity = torch.matmul(candidate, low_matrix.T).max().item()
141
+ ml_quality_score = torch.sigmoid(torch.tensor((high_similarity - low_similarity) * 4.0)).item()
142
+
143
+ return {
144
+ "domain_scores": domain_scores,
145
+ "ml_quality_score": round(float(ml_quality_score), 4),
146
+ "backend_name": self.backend_name,
147
+ "model_id": self.model_id,
148
+ "notes": list(self.notes),
149
+ }
pyproject.toml CHANGED
@@ -5,14 +5,18 @@ build-backend = "setuptools.build_meta"
5
  [project]
6
  name = "openenv-python-code-review-env"
7
  version = "1.0.0"
8
- description = "Production-grade OpenEnv environment for Python code review workflows."
9
  readme = "README.md"
10
  requires-python = ">=3.10"
11
  dependencies = [
12
  "fastapi>=0.111.0",
 
13
  "openai>=1.76.0",
14
  "openenv-core[core]>=0.2.2",
15
  "pytest>=8.0.0",
 
 
 
16
  "uvicorn>=0.30.0",
17
  ]
18
 
@@ -31,5 +35,12 @@ packages = [
31
  "python_env.server",
32
  "python_env.tasks",
33
  "python_env.graders",
 
 
 
 
 
 
 
34
  ]
35
- package-dir = { "python_env" = ".", "python_env.server" = "server", "python_env.tasks" = "tasks", "python_env.graders" = "graders" }
 
5
  [project]
6
  name = "openenv-python-code-review-env"
7
  version = "1.0.0"
8
+ description = "TorchReview Copilot: AI-powered Python code triage with PyTorch and OpenEnv validation."
9
  readme = "README.md"
10
  requires-python = ">=3.10"
11
  dependencies = [
12
  "fastapi>=0.111.0",
13
+ "gradio>=5.26.0",
14
  "openai>=1.76.0",
15
  "openenv-core[core]>=0.2.2",
16
  "pytest>=8.0.0",
17
+ "streamlit>=1.44.0",
18
+ "torch>=2.2.0",
19
+ "transformers>=4.45.0",
20
  "uvicorn>=0.30.0",
21
  ]
22
 
 
35
  "python_env.server",
36
  "python_env.tasks",
37
  "python_env.graders",
38
+ "python_env.api",
39
+ "python_env.app",
40
+ "python_env.analyzers",
41
+ "python_env.models",
42
+ "python_env.schemas",
43
+ "python_env.services",
44
+ "python_env.utils",
45
  ]
46
+ package-dir = { "python_env" = ".", "python_env.server" = "server", "python_env.tasks" = "tasks", "python_env.graders" = "graders", "python_env.api" = "api", "python_env.app" = "app", "python_env.analyzers" = "analyzers", "python_env.models" = "models", "python_env.schemas" = "schemas", "python_env.services" = "services", "python_env.utils" = "utils" }
schemas/__init__.py ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Public schemas for the multi-domain analysis platform."""
2
+
3
+ from .request import AnalyzeCodeRequest
4
+ from .response import AnalyzeCodeResponse, AnalysisIssue, DomainAnalysis, ScoreBreakdown, StaticAnalysisSummary
5
+
6
+ __all__ = [
7
+ "AnalyzeCodeRequest",
8
+ "AnalyzeCodeResponse",
9
+ "AnalysisIssue",
10
+ "DomainAnalysis",
11
+ "ScoreBreakdown",
12
+ "StaticAnalysisSummary",
13
+ ]
schemas/request.py ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Request schemas for code analysis endpoints and UI."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Literal
6
+
7
+ from pydantic import BaseModel, Field
8
+
9
+
10
+ DomainHint = Literal["auto", "dsa", "data_science", "ml_dl", "web"]
11
+
12
+
13
+ class AnalyzeCodeRequest(BaseModel):
14
+ """Validated input payload for multi-domain code analysis."""
15
+
16
+ code: str = Field(..., min_length=1, description="Source code to analyze.")
17
+ context_window: str = Field(default="", max_length=2000, description="Optional repository or task context.")
18
+ traceback_text: str = Field(default="", max_length=2000, description="Optional runtime or test failure output.")
19
+ domain_hint: DomainHint = Field(default="auto", description="Optional domain override when auto detection is not desired.")
schemas/response.py ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Response schemas for the multi-domain analysis platform."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Dict, List, Literal
6
+
7
+ from pydantic import BaseModel, Field
8
+
9
+
10
+ DomainType = Literal["dsa", "data_science", "ml_dl", "web", "general"]
11
+ Severity = Literal["low", "medium", "high"]
12
+
13
+
14
+ class AnalysisIssue(BaseModel):
15
+ """One detected issue or risk in the code snippet."""
16
+
17
+ title: str
18
+ severity: Severity
19
+ description: str
20
+ line_hint: int | None = None
21
+
22
+
23
+ class StaticAnalysisSummary(BaseModel):
24
+ """Language-agnostic static-analysis signals."""
25
+
26
+ syntax_valid: bool
27
+ syntax_error: str = ""
28
+ cyclomatic_complexity: int = Field(..., ge=1)
29
+ line_count: int = Field(..., ge=0)
30
+ max_loop_depth: int = Field(..., ge=0)
31
+ time_complexity: str = "Unknown"
32
+ space_complexity: str = "Unknown"
33
+ detected_imports: List[str] = Field(default_factory=list)
34
+ code_smells: List[str] = Field(default_factory=list)
35
+
36
+
37
+ class DomainAnalysis(BaseModel):
38
+ """Domain-specific analysis payload returned by an analyzer."""
39
+
40
+ domain: DomainType
41
+ domain_score: float = Field(..., ge=0.0, le=1.0)
42
+ issues: List[AnalysisIssue] = Field(default_factory=list)
43
+ suggestions: List[str] = Field(default_factory=list)
44
+ highlights: Dict[str, float | str] = Field(default_factory=dict)
45
+
46
+
47
+ class ScoreBreakdown(BaseModel):
48
+ """Reward inputs and final normalized score."""
49
+
50
+ ml_score: float = Field(..., ge=0.0, le=1.0)
51
+ domain_score: float = Field(..., ge=0.0, le=1.0)
52
+ lint_score: float = Field(..., ge=0.0, le=1.0)
53
+ complexity_penalty: float = Field(..., ge=0.0, le=1.0)
54
+ reward: float = Field(..., ge=0.0, le=1.0)
55
+
56
+
57
+ class AnalyzeCodeResponse(BaseModel):
58
+ """Top-level structured output for API and UI consumers."""
59
+
60
+ detected_domain: DomainType
61
+ domain_confidences: Dict[str, float]
62
+ score_breakdown: ScoreBreakdown
63
+ static_analysis: StaticAnalysisSummary
64
+ domain_analysis: DomainAnalysis
65
+ improvement_plan: List[str] = Field(default_factory=list)
66
+ model_backend: str
67
+ model_id: str
68
+ summary: str
69
+ context_window: str = ""
70
+ analysis_time_ms: float = Field(..., ge=0.0)
server/app.py CHANGED
@@ -1,4 +1,4 @@
1
- """FastAPI entrypoint for python_code_review_env."""
2
 
3
  from __future__ import annotations
4
 
@@ -10,20 +10,36 @@ except Exception as exc: # pragma: no cover
10
  ) from exc
11
 
12
  try:
13
- from ..models import PythonCodeReviewAction, PythonCodeReviewObservation
 
 
 
 
 
14
  from .env import PythonCodeReviewEnvironment
 
15
  except ImportError:
16
- from models import PythonCodeReviewAction, PythonCodeReviewObservation
17
  from server.env import PythonCodeReviewEnvironment
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
 
20
- app = create_app(
21
- PythonCodeReviewEnvironment,
22
- PythonCodeReviewAction,
23
- PythonCodeReviewObservation,
24
- env_name="python_code_review_env",
25
- max_concurrent_envs=4,
26
- )
27
 
28
 
29
  def main(host: str = "0.0.0.0", port: int = 8000) -> None:
 
1
+ """FastAPI + Gradio entrypoint for TorchReview Copilot."""
2
 
3
  from __future__ import annotations
4
 
 
10
  ) from exc
11
 
12
  try:
13
+ import gradio as gr
14
+ except Exception:
15
+ gr = None # type: ignore[assignment]
16
+
17
+ try:
18
+ from ..Models import PythonCodeReviewAction, PythonCodeReviewObservation
19
  from .env import PythonCodeReviewEnvironment
20
+ from .demo import build_demo
21
  except ImportError:
22
+ from Models import PythonCodeReviewAction, PythonCodeReviewObservation
23
  from server.env import PythonCodeReviewEnvironment
24
+ from server.demo import build_demo
25
+
26
+
27
+ def build_application():
28
+ """Compose the OpenEnv API with the Gradio demo frontend."""
29
+
30
+ api_app = create_app(
31
+ PythonCodeReviewEnvironment,
32
+ PythonCodeReviewAction,
33
+ PythonCodeReviewObservation,
34
+ env_name="python_code_review_env",
35
+ max_concurrent_envs=4,
36
+ )
37
+ if gr is None:
38
+ return api_app
39
+ return gr.mount_gradio_app(api_app, build_demo(), path="/")
40
 
41
 
42
+ app = build_application()
 
 
 
 
 
 
43
 
44
 
45
  def main(host: str = "0.0.0.0", port: int = 8000) -> None:
server/demo.py ADDED
@@ -0,0 +1,441 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Gradio UI for TorchReview Copilot."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from html import escape
6
+
7
+ import gradio as gr
8
+
9
+ try:
10
+ from ..triage import get_default_engine
11
+ except ImportError:
12
+ from triage import get_default_engine
13
+
14
+
15
+ CSS = """
16
+ :root {
17
+ --paper: #f6f1e8;
18
+ --ink: #162521;
19
+ --accent: #d95d39;
20
+ --panel: #fffdf8;
21
+ --border: #d6c4b8;
22
+ --muted: #5f6f67;
23
+ --good: #2d7d62;
24
+ --warn: #b76516;
25
+ --high: #b23a48;
26
+ }
27
+
28
+ body, .gradio-container {
29
+ background:
30
+ radial-gradient(circle at top left, rgba(247, 197, 159, 0.35), transparent 35%),
31
+ linear-gradient(135deg, #f9f6ef 0%, #efe5d3 100%);
32
+ color: var(--ink);
33
+ font-family: Georgia, "Times New Roman", serif;
34
+ }
35
+
36
+ .gradio-container {
37
+ max-width: 1260px !important;
38
+ }
39
+
40
+ .hero-card,
41
+ .metric-card,
42
+ .subtle-card {
43
+ background: rgba(255, 253, 248, 0.95);
44
+ border: 1px solid var(--border);
45
+ border-radius: 20px;
46
+ box-shadow: 0 16px 40px rgba(22, 37, 33, 0.08);
47
+ }
48
+
49
+ .hero-card {
50
+ padding: 28px 30px;
51
+ margin-bottom: 12px;
52
+ }
53
+
54
+ .metric-card,
55
+ .subtle-card {
56
+ padding: 20px 22px;
57
+ }
58
+
59
+ .eyebrow {
60
+ text-transform: uppercase;
61
+ letter-spacing: 0.12em;
62
+ font-size: 12px;
63
+ color: var(--accent);
64
+ margin-bottom: 10px;
65
+ }
66
+
67
+ .hero-title {
68
+ font-size: 44px;
69
+ line-height: 1.05;
70
+ margin: 0 0 10px;
71
+ }
72
+
73
+ .hero-copy {
74
+ margin: 0;
75
+ font-size: 18px;
76
+ line-height: 1.55;
77
+ color: var(--muted);
78
+ }
79
+
80
+ .summary-title {
81
+ display: flex;
82
+ justify-content: space-between;
83
+ gap: 12px;
84
+ align-items: center;
85
+ margin-bottom: 14px;
86
+ }
87
+
88
+ .pill {
89
+ display: inline-block;
90
+ padding: 6px 12px;
91
+ border-radius: 999px;
92
+ font-size: 12px;
93
+ text-transform: uppercase;
94
+ letter-spacing: 0.08em;
95
+ background: #efe5d3;
96
+ }
97
+
98
+ .pill.low { color: var(--good); }
99
+ .pill.medium { color: var(--warn); }
100
+ .pill.high { color: var(--high); }
101
+
102
+ .summary-grid {
103
+ display: grid;
104
+ grid-template-columns: repeat(2, minmax(0, 1fr));
105
+ gap: 12px;
106
+ margin-top: 16px;
107
+ }
108
+
109
+ .summary-stat {
110
+ background: #fff7ef;
111
+ border-radius: 14px;
112
+ padding: 12px 14px;
113
+ border: 1px solid rgba(214, 196, 184, 0.8);
114
+ }
115
+
116
+ .summary-stat strong {
117
+ display: block;
118
+ font-size: 12px;
119
+ text-transform: uppercase;
120
+ letter-spacing: 0.08em;
121
+ color: var(--muted);
122
+ margin-bottom: 6px;
123
+ }
124
+
125
+ .radar-wrap {
126
+ display: grid;
127
+ gap: 12px;
128
+ }
129
+
130
+ .bar {
131
+ display: grid;
132
+ gap: 6px;
133
+ }
134
+
135
+ .bar-head {
136
+ display: flex;
137
+ justify-content: space-between;
138
+ font-size: 13px;
139
+ color: var(--muted);
140
+ }
141
+
142
+ .bar-track {
143
+ width: 100%;
144
+ height: 12px;
145
+ background: #f2e5d6;
146
+ border-radius: 999px;
147
+ overflow: hidden;
148
+ }
149
+
150
+ .bar-fill {
151
+ height: 100%;
152
+ border-radius: 999px;
153
+ }
154
+
155
+ .matched-box {
156
+ background: #fff7ef;
157
+ border: 1px solid rgba(214, 196, 184, 0.8);
158
+ border-radius: 16px;
159
+ padding: 14px;
160
+ }
161
+
162
+ .how-grid {
163
+ display: grid;
164
+ grid-template-columns: repeat(4, minmax(0, 1fr));
165
+ gap: 12px;
166
+ }
167
+
168
+ .how-step {
169
+ background: rgba(255, 253, 248, 0.9);
170
+ border: 1px solid var(--border);
171
+ border-radius: 18px;
172
+ padding: 16px;
173
+ }
174
+
175
+ @media (max-width: 900px) {
176
+ .hero-title {
177
+ font-size: 34px;
178
+ }
179
+
180
+ .summary-grid,
181
+ .how-grid {
182
+ grid-template-columns: 1fr;
183
+ }
184
+ }
185
+ """
186
+
187
+
188
+ def _default_outputs() -> tuple[str, str, str, str, str]:
189
+ return (
190
+ "<div class='metric-card'><div class='eyebrow'>Awaiting Analysis</div><p class='hero-copy'>Paste Python code, add an optional traceback, or load one of the built-in examples.</p></div>",
191
+ "<div class='metric-card'><div class='eyebrow'>Live Triage Radar</div><p class='hero-copy'>Confidence bars will appear after the first analysis run.</p></div>",
192
+ "### Improvement Plan\nAnalyze a sample to generate syntax, edge-case, and scalability recommendations.",
193
+ "### Known Pattern Match\nThe nearest OpenEnv task will be highlighted here after inference runs.",
194
+ "### Model Notes\nBackend and extracted signal details will appear here.",
195
+ )
196
+
197
+
198
+ def _summary_html(result) -> str:
199
+ issue = escape(result.issue_label.title())
200
+ summary = escape(result.summary)
201
+ next_action = escape(result.suggested_next_action)
202
+ return f"""
203
+ <div class="metric-card">
204
+ <div class="summary-title">
205
+ <div>
206
+ <div class="eyebrow">TorchReview Verdict</div>
207
+ <h3 style="margin:0;font-size:30px;">{issue} Issue</h3>
208
+ </div>
209
+ <span class="pill {escape(result.repair_risk)}">{escape(result.repair_risk)} repair risk</span>
210
+ </div>
211
+ <p class="hero-copy">{summary}</p>
212
+ <div class="summary-grid">
213
+ <div class="summary-stat">
214
+ <strong>Reward Score</strong>
215
+ {result.reward_score:.0%}
216
+ </div>
217
+ <div class="summary-stat">
218
+ <strong>ML Quality</strong>
219
+ {result.ml_quality_score:.0%}
220
+ </div>
221
+ <div class="summary-stat">
222
+ <strong>Matched Pattern</strong>
223
+ {escape(result.matched_pattern.title)}
224
+ </div>
225
+ <div class="summary-stat">
226
+ <strong>Inference Backend</strong>
227
+ {escape(result.model_backend)}
228
+ </div>
229
+ <div class="summary-stat">
230
+ <strong>Lint Score</strong>
231
+ {result.lint_score:.0%}
232
+ </div>
233
+ <div class="summary-stat">
234
+ <strong>Complexity Penalty</strong>
235
+ {result.complexity_penalty:.0%}
236
+ </div>
237
+ <div class="summary-stat">
238
+ <strong>Next Action</strong>
239
+ {next_action}
240
+ </div>
241
+ </div>
242
+ </div>
243
+ """
244
+
245
+
246
+ def _radar_html(result) -> str:
247
+ colors = {
248
+ "syntax": "#d95d39",
249
+ "logic": "#4f772d",
250
+ "performance": "#355070",
251
+ }
252
+ bars = []
253
+ for label, score in result.confidence_scores.items():
254
+ bars.append(
255
+ f"""
256
+ <div class="bar">
257
+ <div class="bar-head"><span>{escape(label.title())}</span><span>{score:.0%}</span></div>
258
+ <div class="bar-track">
259
+ <div class="bar-fill" style="width:{score * 100:.1f}%; background:{colors.get(label, '#d95d39')};"></div>
260
+ </div>
261
+ </div>
262
+ """
263
+ )
264
+ return f"""
265
+ <div class="metric-card radar-wrap">
266
+ <div class="eyebrow">Live Triage Radar</div>
267
+ {''.join(bars)}
268
+ <div class="matched-box">
269
+ <strong>Nearest Known Pattern:</strong> {escape(result.matched_pattern.title)}<br>
270
+ <span style="color:#5f6f67;">{escape(result.matched_pattern.summary)}</span>
271
+ </div>
272
+ </div>
273
+ """
274
+
275
+
276
+ def _plan_markdown(result) -> str:
277
+ plan_lines = "\n".join(f"{index + 1}. {step}" for index, step in enumerate(result.repair_plan))
278
+ return (
279
+ "### Improvement Plan\n"
280
+ f"**Primary issue:** `{result.issue_label}`\n\n"
281
+ f"{plan_lines}\n\n"
282
+ f"**Suggested next action:** {result.suggested_next_action}"
283
+ )
284
+
285
+
286
+ def _match_markdown(result) -> str:
287
+ return (
288
+ "### Known Pattern Match\n"
289
+ f"**Task:** `{result.matched_pattern.task_id}` \n"
290
+ f"**Title:** {result.matched_pattern.title} \n"
291
+ f"**Why it matched:** {result.matched_pattern.rationale} \n"
292
+ f"**Similarity:** {result.matched_pattern.similarity:.0%}"
293
+ )
294
+
295
+
296
+ def _model_markdown(result) -> str:
297
+ signal_lines = "\n".join(
298
+ f"- `{signal.name}` -> {signal.value} ({signal.impact}, weight {signal.weight:.2f}): {signal.evidence}"
299
+ for signal in result.extracted_signals
300
+ ) or "- No strong static signals were extracted."
301
+ notes = "\n".join(f"- {item}" for item in result.inference_notes) or "- No additional backend notes."
302
+ return (
303
+ "### Model Notes\n"
304
+ f"- **Model backend:** `{result.model_backend}`\n"
305
+ f"- **Model id:** `{result.model_id}`\n"
306
+ f"- **Analysis time:** `{result.analysis_time_ms:.2f} ms`\n\n"
307
+ "### Reward Formula\n"
308
+ f"- `reward = (0.5 x {result.ml_quality_score:.2f}) + (0.3 x {result.lint_score:.2f}) - (0.2 x {result.complexity_penalty:.2f})`\n"
309
+ f"- **Final reward:** `{result.reward_score:.2f}`\n\n"
310
+ "### Extracted Signals\n"
311
+ f"{signal_lines}\n\n"
312
+ "### Backend Notes\n"
313
+ f"{notes}"
314
+ )
315
+
316
+
317
+ def analyze_inputs(code: str, traceback_text: str, context_window: str) -> tuple[str, str, str, str, str]:
318
+ """Run the triage engine and format outputs for the Gradio UI."""
319
+
320
+ result = get_default_engine().triage(code or "", traceback_text or "", context_window or "")
321
+ return (
322
+ _summary_html(result),
323
+ _radar_html(result),
324
+ _plan_markdown(result),
325
+ _match_markdown(result),
326
+ _model_markdown(result),
327
+ )
328
+
329
+
330
+ def load_example(example_key: str) -> tuple[str, str, str, str, str, str, str, str, str]:
331
+ """Populate the UI from a built-in example and immediately analyze it."""
332
+
333
+ example = get_default_engine().example_map()[example_key]
334
+ outputs = analyze_inputs(example.code, example.traceback_text, example.context_window)
335
+ header = (
336
+ f"### Example Scenario\n"
337
+ f"**{example.title}** \n"
338
+ f"{example.summary} \n"
339
+ f"Label target: `{example.label}`"
340
+ )
341
+ return (example.code, example.traceback_text, example.context_window, header, *outputs)
342
+
343
+
344
+ def build_demo() -> gr.Blocks:
345
+ """Create the TorchReview Copilot Gradio application."""
346
+
347
+ examples = get_default_engine().example_map()
348
+ first_example = next(iter(examples.values()))
349
+
350
+ with gr.Blocks(theme=gr.themes.Soft(primary_hue="orange", secondary_hue="amber"), css=CSS, title="TorchReview Copilot") as demo:
351
+ gr.HTML(
352
+ """
353
+ <div class="hero-card">
354
+ <div class="eyebrow">Meta PyTorch OpenEnv Hackathon Demo</div>
355
+ <h1 class="hero-title">TorchReview Copilot</h1>
356
+ <p class="hero-copy">
357
+ AI-powered code review and improvement system using PyTorch to score code quality, surface bugs,
358
+ and generate a three-step improvement plan. OpenEnv stays underneath as the deterministic validation engine.
359
+ </p>
360
+ </div>
361
+ """
362
+ )
363
+
364
+ with gr.Row():
365
+ with gr.Column(scale=6):
366
+ example_choice = gr.Radio(
367
+ choices=[(item.title, item.key) for item in examples.values()],
368
+ value=first_example.key,
369
+ label="Try a built-in failure scenario",
370
+ info="Switching examples updates the Live Triage Radar immediately.",
371
+ )
372
+ example_header = gr.Markdown()
373
+ code_input = gr.Code(
374
+ value=first_example.code,
375
+ language="python",
376
+ lines=18,
377
+ label="Python code under review",
378
+ )
379
+ traceback_input = gr.Textbox(
380
+ value=first_example.traceback_text,
381
+ lines=7,
382
+ label="Optional traceback / failing test output",
383
+ placeholder="Paste stack traces, assertion failures, or benchmark notes here.",
384
+ )
385
+ context_input = gr.Textbox(
386
+ value=first_example.context_window,
387
+ lines=4,
388
+ label="Context window",
389
+ placeholder="Describe expected behavior, constraints, or repository context.",
390
+ )
391
+ with gr.Row():
392
+ analyze_button = gr.Button("Analyze & Score Code", variant="primary")
393
+ clear_button = gr.Button("Clear Inputs", variant="secondary")
394
+
395
+ with gr.Column(scale=5):
396
+ summary_html = gr.HTML()
397
+ radar_html = gr.HTML()
398
+ plan_markdown = gr.Markdown()
399
+ match_markdown = gr.Markdown()
400
+ model_markdown = gr.Markdown()
401
+
402
+ gr.HTML(
403
+ """
404
+ <div class="subtle-card" style="margin-top: 12px;">
405
+ <div class="eyebrow">How It Works</div>
406
+ <div class="how-grid">
407
+ <div class="how-step"><strong>Input</strong><br>Code plus optional traceback or benchmark signal.</div>
408
+ <div class="how-step"><strong>Processing</strong><br>Static checks extract parser, lint, complexity, and runtime clues.</div>
409
+ <div class="how-step"><strong>Model</strong><br>CodeBERTa embeddings run through PyTorch and score code quality against known OpenEnv patterns.</div>
410
+ <div class="how-step"><strong>Output</strong><br>Confidence radar, reward score, and a three-step improvement plan.</div>
411
+ </div>
412
+ </div>
413
+ """
414
+ )
415
+
416
+ example_choice.change(
417
+ fn=load_example,
418
+ inputs=example_choice,
419
+ outputs=[code_input, traceback_input, context_input, example_header, summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
420
+ show_progress="hidden",
421
+ )
422
+ analyze_button.click(
423
+ fn=analyze_inputs,
424
+ inputs=[code_input, traceback_input, context_input],
425
+ outputs=[summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
426
+ show_progress="minimal",
427
+ )
428
+ clear_button.click(
429
+ fn=lambda: ("", "", "", "### Example Scenario\nChoose a built-in example or paste custom code.", *_default_outputs()),
430
+ inputs=None,
431
+ outputs=[code_input, traceback_input, context_input, example_header, summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
432
+ show_progress="hidden",
433
+ )
434
+ demo.load(
435
+ fn=load_example,
436
+ inputs=example_choice,
437
+ outputs=[code_input, traceback_input, context_input, example_header, summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
438
+ show_progress="hidden",
439
+ )
440
+
441
+ return demo
server/env.py CHANGED
@@ -11,7 +11,7 @@ from openenv.core.env_server.types import EnvironmentMetadata
11
  try:
12
  from ..graders import grade_task
13
  from ..graders.shared import component_score, safe_ratio, strict_score
14
- from ..models import (
15
  HistoryEntry,
16
  PythonCodeReviewAction,
17
  PythonCodeReviewObservation,
@@ -23,7 +23,7 @@ try:
23
  except ImportError:
24
  from graders import grade_task
25
  from graders.shared import component_score, safe_ratio, strict_score
26
- from models import (
27
  HistoryEntry,
28
  PythonCodeReviewAction,
29
  PythonCodeReviewObservation,
 
11
  try:
12
  from ..graders import grade_task
13
  from ..graders.shared import component_score, safe_ratio, strict_score
14
+ from ..Models import (
15
  HistoryEntry,
16
  PythonCodeReviewAction,
17
  PythonCodeReviewObservation,
 
23
  except ImportError:
24
  from graders import grade_task
25
  from graders.shared import component_score, safe_ratio, strict_score
26
+ from Models import (
27
  HistoryEntry,
28
  PythonCodeReviewAction,
29
  PythonCodeReviewObservation,
server/requirements.txt CHANGED
@@ -1,5 +1,9 @@
1
  openenv-core[core]>=0.2.2
2
  fastapi>=0.111.0
 
3
  uvicorn>=0.30.0
4
  pytest>=8.0.0
5
  openai>=1.76.0
 
 
 
 
1
  openenv-core[core]>=0.2.2
2
  fastapi>=0.111.0
3
+ gradio>=5.26.0
4
  uvicorn>=0.30.0
5
  pytest>=8.0.0
6
  openai>=1.76.0
7
+ streamlit>=1.44.0
8
+ torch>=2.2.0
9
+ transformers>=4.45.0
services/__init__.py ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ """Service layer for orchestrating analysis, suggestions, and rewards."""
2
+
3
+ from .analysis_service import AnalysisService
4
+ from .reward_service import RewardService
5
+ from .suggestion_service import SuggestionService
6
+
7
+ __all__ = ["AnalysisService", "RewardService", "SuggestionService"]
services/analysis_service.py ADDED
@@ -0,0 +1,133 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Orchestration layer for multi-domain code analysis."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import time
6
+ from typing import Any, Callable, Dict
7
+
8
+ from analyzers import analyze_data_science_code, analyze_dsa_code, analyze_ml_code, analyze_web_code
9
+ from models import PyTorchCodeAnalyzerModel
10
+ from schemas.request import AnalyzeCodeRequest
11
+ from schemas.response import AnalyzeCodeResponse, DomainAnalysis, StaticAnalysisSummary
12
+ from services.reward_service import RewardService
13
+ from services.suggestion_service import SuggestionService
14
+ from utils import estimate_complexity, parse_code_structure
15
+
16
+
17
+ def _lint_score(parsed: Dict[str, Any]) -> float:
18
+ """Convert structural smells into a normalized lint-style score."""
19
+
20
+ score = 1.0
21
+ if not parsed.get("syntax_valid", True):
22
+ score -= 0.45
23
+ score -= min(parsed.get("long_lines", 0), 5) * 0.03
24
+ if parsed.get("tabs_used"):
25
+ score -= 0.1
26
+ if parsed.get("trailing_whitespace_lines"):
27
+ score -= 0.05
28
+ if parsed.get("docstring_ratio", 0.0) == 0.0 and parsed.get("function_names"):
29
+ score -= 0.08
30
+ return round(max(0.0, min(1.0, score)), 4)
31
+
32
+
33
+ class AnalysisService:
34
+ """End-to-end analysis pipeline shared by API and UI."""
35
+
36
+ def __init__(self) -> None:
37
+ self.model = PyTorchCodeAnalyzerModel()
38
+ self.reward_service = RewardService()
39
+ self.suggestion_service = SuggestionService()
40
+ self._analyzers: Dict[str, Callable[[str, Dict[str, Any], Dict[str, Any]], DomainAnalysis]] = {
41
+ "dsa": analyze_dsa_code,
42
+ "data_science": analyze_data_science_code,
43
+ "ml_dl": analyze_ml_code,
44
+ "web": analyze_web_code,
45
+ }
46
+
47
+ def _heuristic_domain_scores(self, parsed: Dict[str, Any], code: str) -> Dict[str, float]:
48
+ """Derive domain priors from imports and syntax-level hints."""
49
+
50
+ scores = {
51
+ "dsa": 0.2 + (0.15 if parsed.get("uses_recursion") else 0.0) + (0.15 if parsed.get("max_loop_depth", 0) >= 1 else 0.0),
52
+ "data_science": 0.2 + (0.35 if parsed.get("uses_pandas") or parsed.get("uses_numpy") else 0.0),
53
+ "ml_dl": 0.2 + (0.35 if parsed.get("uses_torch") or parsed.get("uses_sklearn") else 0.0),
54
+ "web": 0.2 + (0.35 if parsed.get("uses_fastapi") or parsed.get("uses_flask") else 0.0) + (0.1 if parsed.get("route_decorators") else 0.0),
55
+ "general": 0.2,
56
+ }
57
+ if "fastapi" in code.lower():
58
+ scores["web"] += 0.1
59
+ if "pandas" in code.lower() or "numpy" in code.lower():
60
+ scores["data_science"] += 0.1
61
+ if "torch" in code.lower():
62
+ scores["ml_dl"] += 0.1
63
+ if "while" in code or "for" in code:
64
+ scores["dsa"] += 0.05
65
+ return {key: round(min(value, 0.99), 4) for key, value in scores.items()}
66
+
67
+ def analyze(self, request: AnalyzeCodeRequest) -> AnalyzeCodeResponse:
68
+ """Run the complete multi-domain analysis pipeline."""
69
+
70
+ started = time.perf_counter()
71
+ parsed = parse_code_structure(request.code)
72
+ complexity = estimate_complexity(parsed, request.code)
73
+ model_prediction = self.model.predict(request.code, request.context_window, parsed)
74
+ heuristic_scores = self._heuristic_domain_scores(parsed, request.code)
75
+
76
+ combined_scores = {}
77
+ for domain, heuristic_score in heuristic_scores.items():
78
+ model_score = float(model_prediction["domain_scores"].get(domain, 0.2))
79
+ combined_scores[domain] = round((0.6 * model_score) + (0.4 * heuristic_score), 4)
80
+
81
+ detected_domain = request.domain_hint if request.domain_hint != "auto" else max(combined_scores, key=combined_scores.get)
82
+ analyzer = self._analyzers.get(detected_domain)
83
+ domain_analysis = (
84
+ analyzer(request.code, parsed, complexity)
85
+ if analyzer is not None
86
+ else DomainAnalysis(
87
+ domain="general",
88
+ domain_score=0.6,
89
+ issues=[],
90
+ suggestions=["Add stronger domain-specific context for deeper analysis."],
91
+ highlights={},
92
+ )
93
+ )
94
+
95
+ lint_score = _lint_score(parsed)
96
+ score_breakdown = self.reward_service.compute(
97
+ ml_score=float(model_prediction["ml_quality_score"]),
98
+ domain_score=domain_analysis.domain_score,
99
+ lint_score=lint_score,
100
+ complexity_penalty=float(complexity["complexity_penalty"]),
101
+ )
102
+ static_analysis = StaticAnalysisSummary(
103
+ syntax_valid=bool(parsed["syntax_valid"]),
104
+ syntax_error=str(parsed["syntax_error"]),
105
+ cyclomatic_complexity=int(complexity["cyclomatic_complexity"]),
106
+ line_count=int(parsed["line_count"]),
107
+ max_loop_depth=int(parsed["max_loop_depth"]),
108
+ time_complexity=str(complexity["time_complexity"]),
109
+ space_complexity=str(complexity["space_complexity"]),
110
+ detected_imports=list(parsed["imports"]),
111
+ code_smells=list(parsed["code_smells"]),
112
+ )
113
+ improvement_plan = self.suggestion_service.build_improvement_plan(
114
+ domain_analysis=domain_analysis,
115
+ static_analysis=static_analysis,
116
+ )
117
+ summary = (
118
+ f"Detected `{detected_domain}` code with a model score of {score_breakdown.ml_score:.0%}, "
119
+ f"domain score {score_breakdown.domain_score:.0%}, and final reward {score_breakdown.reward:.0%}."
120
+ )
121
+ return AnalyzeCodeResponse(
122
+ detected_domain=detected_domain, # type: ignore[arg-type]
123
+ domain_confidences=combined_scores,
124
+ score_breakdown=score_breakdown,
125
+ static_analysis=static_analysis,
126
+ domain_analysis=domain_analysis,
127
+ improvement_plan=improvement_plan,
128
+ model_backend=str(model_prediction["backend_name"]),
129
+ model_id=str(model_prediction["model_id"]),
130
+ summary=summary,
131
+ context_window=request.context_window,
132
+ analysis_time_ms=round((time.perf_counter() - started) * 1000.0, 2),
133
+ )
services/reward_service.py ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Reward shaping logic for RL-ready code analysis scores."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from schemas.response import ScoreBreakdown
6
+
7
+
8
+ class RewardService:
9
+ """Compute reward scores from model, domain, lint, and complexity signals."""
10
+
11
+ def compute(self, *, ml_score: float, domain_score: float, lint_score: float, complexity_penalty: float) -> ScoreBreakdown:
12
+ """Apply the weighted reward formula and clamp the result."""
13
+
14
+ reward = max(
15
+ 0.0,
16
+ min(
17
+ 1.0,
18
+ (0.4 * ml_score) + (0.2 * domain_score) + (0.2 * lint_score) - (0.2 * complexity_penalty),
19
+ ),
20
+ )
21
+ return ScoreBreakdown(
22
+ ml_score=round(ml_score, 4),
23
+ domain_score=round(domain_score, 4),
24
+ lint_score=round(lint_score, 4),
25
+ complexity_penalty=round(complexity_penalty, 4),
26
+ reward=round(reward, 4),
27
+ )
services/suggestion_service.py ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Suggestion and improvement-plan generation for analyzed code."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from schemas.response import DomainAnalysis, StaticAnalysisSummary
6
+
7
+
8
+ class SuggestionService:
9
+ """Build high-signal improvement steps from analysis output."""
10
+
11
+ def build_improvement_plan(self, *, domain_analysis: DomainAnalysis, static_analysis: StaticAnalysisSummary) -> list[str]:
12
+ """Return a compact three-step plan optimized for developer action."""
13
+
14
+ primary_issue = (
15
+ domain_analysis.issues[0].description
16
+ if domain_analysis.issues
17
+ else "Stabilize correctness first and keep the public behavior explicit."
18
+ )
19
+
20
+ step_one = f"Step 1 - Correctness and safety: {primary_issue}"
21
+ step_two = "Step 2 - Edge cases: test empty inputs, boundary values, malformed payloads, and failure-mode behavior explicitly."
22
+ step_three = "Step 3 - Scalability: reduce repeated scans, lower cyclomatic complexity, and benchmark the path on realistic input sizes."
23
+
24
+ if domain_analysis.suggestions:
25
+ step_three = f"{step_three} Priority hint: {domain_analysis.suggestions[0]}"
26
+ if not static_analysis.syntax_valid:
27
+ step_one = f"Step 1 - Correctness and safety: fix the syntax error first ({static_analysis.syntax_error})."
28
+ return [step_one, step_two, step_three]
tests/test_multi_domain_platform.py ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from fastapi.testclient import TestClient
4
+
5
+ from api.main import app
6
+ from schemas.request import AnalyzeCodeRequest
7
+ from services.analysis_service import AnalysisService
8
+
9
+
10
+ def test_analysis_service_detects_web_code() -> None:
11
+ service = AnalysisService()
12
+ request = AnalyzeCodeRequest(
13
+ code="from fastapi import FastAPI\napp = FastAPI()\n\n@app.get('/health')\ndef health():\n return {'status': 'ok'}\n",
14
+ domain_hint="auto",
15
+ )
16
+
17
+ result = service.analyze(request)
18
+
19
+ assert result.detected_domain == "web"
20
+ assert 0.0 <= result.score_breakdown.reward <= 1.0
21
+ assert len(result.improvement_plan) == 3
22
+
23
+
24
+ def test_analysis_service_detects_dsa_code() -> None:
25
+ service = AnalysisService()
26
+ request = AnalyzeCodeRequest(
27
+ code="def has_pair(nums, target):\n for i in range(len(nums)):\n for j in range(i + 1, len(nums)):\n if nums[i] + nums[j] == target:\n return True\n return False\n",
28
+ domain_hint="auto",
29
+ )
30
+
31
+ result = service.analyze(request)
32
+
33
+ assert result.detected_domain == "dsa"
34
+ assert result.static_analysis.time_complexity in {"O(n^2)", "O(n^3)"}
35
+
36
+
37
+ def test_api_analyze_endpoint_returns_valid_payload() -> None:
38
+ client = TestClient(app)
39
+ response = client.post(
40
+ "/analyze",
41
+ json={
42
+ "code": "import torch\n\ndef predict(model, x):\n return model(x)\n",
43
+ "context_window": "Inference helper for a classifier",
44
+ "traceback_text": "",
45
+ "domain_hint": "auto",
46
+ },
47
+ )
48
+
49
+ assert response.status_code == 200
50
+ payload = response.json()
51
+ assert "detected_domain" in payload
52
+ assert "score_breakdown" in payload
tests/test_scoring.py CHANGED
@@ -1,7 +1,7 @@
1
  from __future__ import annotations
2
 
3
  from graders import grade_task
4
- from models import PythonCodeReviewAction
5
  from server.env import PythonCodeReviewEnvironment
6
  from tasks import list_tasks
7
 
 
1
  from __future__ import annotations
2
 
3
  from graders import grade_task
4
+ from Models import PythonCodeReviewAction
5
  from server.env import PythonCodeReviewEnvironment
6
  from tasks import list_tasks
7
 
tests/test_triage_pipeline.py ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from fastapi.testclient import TestClient
4
+
5
+ from triage import CodeTriageEngine, HashingEmbeddingBackend
6
+ from triage_catalog import build_examples
7
+
8
+
9
+ def test_hashing_backend_returns_normalized_embeddings() -> None:
10
+ backend = HashingEmbeddingBackend(dimensions=32)
11
+ embeddings = backend.embed_texts(["def foo():\n return 1", "for x in items:\n pass"])
12
+
13
+ assert embeddings.shape == (2, 32)
14
+ for row in embeddings:
15
+ assert round(float(row.norm().item()), 5) == 1.0
16
+
17
+
18
+ def test_examples_map_to_expected_labels_with_fallback_backend() -> None:
19
+ examples = build_examples()
20
+ engine = CodeTriageEngine(backend=HashingEmbeddingBackend())
21
+
22
+ for example in examples:
23
+ result = engine.triage(example.code, example.traceback_text, example.context_window)
24
+ assert result.issue_label == example.label
25
+ assert 0.0 <= result.reward_score <= 1.0
26
+
27
+
28
+ def test_syntax_example_exposes_parser_signal() -> None:
29
+ example = next(item for item in build_examples() if item.label == "syntax")
30
+ engine = CodeTriageEngine(backend=HashingEmbeddingBackend())
31
+
32
+ result = engine.triage(example.code, example.traceback_text, example.context_window)
33
+
34
+ assert any(signal.name == "syntax_parse" and signal.value == "fails" for signal in result.extracted_signals)
35
+ assert result.matched_pattern.task_id == example.task_id
36
+ assert result.repair_plan[0].startswith("Step 1 - Syntax checking and bug fixes")
37
+
38
+
39
+ def test_composed_app_preserves_health_route() -> None:
40
+ from server.app import build_application
41
+
42
+ client = TestClient(build_application())
43
+ response = client.get("/health")
44
+
45
+ assert response.status_code == 200
46
+ assert response.json()["status"] == "ok"
triage.py ADDED
@@ -0,0 +1,473 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """PyTorch-backed triage pipeline for TorchReview Copilot."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import ast
6
+ import hashlib
7
+ import os
8
+ import re
9
+ import time
10
+ from functools import lru_cache
11
+ from typing import List, Sequence
12
+
13
+ import torch
14
+ import torch.nn.functional as F
15
+
16
+ try:
17
+ from transformers import AutoModel, AutoTokenizer
18
+ except Exception:
19
+ AutoModel = None # type: ignore[assignment]
20
+ AutoTokenizer = None # type: ignore[assignment]
21
+
22
+ try:
23
+ from .triage_catalog import build_examples, build_prototypes
24
+ from .triage_models import (
25
+ IssueLabel,
26
+ PrototypeMatch,
27
+ TriageExample,
28
+ TriagePrototype,
29
+ TriageResult,
30
+ TriageSignal,
31
+ )
32
+ except ImportError:
33
+ from triage_catalog import build_examples, build_prototypes
34
+ from triage_models import (
35
+ IssueLabel,
36
+ PrototypeMatch,
37
+ TriageExample,
38
+ TriagePrototype,
39
+ TriageResult,
40
+ TriageSignal,
41
+ )
42
+
43
+
44
+ MODEL_ID = os.getenv("TRIAGE_MODEL_ID", "huggingface/CodeBERTa-small-v1")
45
+ MODEL_MAX_LENGTH = int(os.getenv("TRIAGE_MODEL_MAX_LENGTH", "256"))
46
+ LABELS: tuple[IssueLabel, ...] = ("syntax", "logic", "performance")
47
+
48
+
49
+ class _LoopDepthVisitor(ast.NodeVisitor):
50
+ """Track the maximum loop nesting depth in a code snippet."""
51
+
52
+ def __init__(self) -> None:
53
+ self.depth = 0
54
+ self.max_depth = 0
55
+
56
+ def _visit_loop(self, node: ast.AST) -> None:
57
+ self.depth += 1
58
+ self.max_depth = max(self.max_depth, self.depth)
59
+ self.generic_visit(node)
60
+ self.depth -= 1
61
+
62
+ def visit_For(self, node: ast.For) -> None: # noqa: N802
63
+ self._visit_loop(node)
64
+
65
+ def visit_While(self, node: ast.While) -> None: # noqa: N802
66
+ self._visit_loop(node)
67
+
68
+ def visit_comprehension(self, node: ast.comprehension) -> None: # noqa: N802
69
+ self._visit_loop(node)
70
+
71
+
72
+ class HashingEmbeddingBackend:
73
+ """Deterministic torch-native fallback when pretrained weights are unavailable."""
74
+
75
+ def __init__(self, dimensions: int = 96) -> None:
76
+ self.dimensions = dimensions
77
+ self.model_id = "hashed-token-fallback"
78
+ self.backend_name = "hashed-token-fallback"
79
+ self.notes = ["Using hashed torch embeddings because pretrained weights are unavailable."]
80
+
81
+ def embed_texts(self, texts: Sequence[str]) -> torch.Tensor:
82
+ rows = torch.zeros((len(texts), self.dimensions), dtype=torch.float32)
83
+ for row_index, text in enumerate(texts):
84
+ tokens = re.findall(r"[A-Za-z_]+|\d+|==|!=|<=|>=|\S", text.lower())[:512]
85
+ if not tokens:
86
+ rows[row_index, 0] = 1.0
87
+ continue
88
+ for token in tokens:
89
+ digest = hashlib.md5(token.encode("utf-8")).hexdigest()
90
+ bucket = int(digest[:8], 16) % self.dimensions
91
+ sign = -1.0 if int(digest[8:10], 16) % 2 else 1.0
92
+ rows[row_index, bucket] += sign
93
+ return F.normalize(rows + 1e-6, dim=1)
94
+
95
+
96
+ class TransformersEmbeddingBackend:
97
+ """Mean-pool CodeBERTa embeddings via torch + transformers."""
98
+
99
+ def __init__(self, model_id: str = MODEL_ID, force_fallback: bool = False) -> None:
100
+ self.model_id = model_id
101
+ self.force_fallback = force_fallback
102
+ self.backend_name = model_id
103
+ self.notes: List[str] = []
104
+ self._fallback = HashingEmbeddingBackend()
105
+ self._tokenizer = None
106
+ self._model = None
107
+ self._load_error = ""
108
+ if force_fallback:
109
+ self.backend_name = self._fallback.backend_name
110
+ self.notes = list(self._fallback.notes)
111
+
112
+ def _ensure_loaded(self) -> None:
113
+ if self.force_fallback or self._model is not None or self._load_error:
114
+ return
115
+ if AutoTokenizer is None or AutoModel is None:
116
+ self._load_error = "transformers is not installed."
117
+ else:
118
+ try:
119
+ self._tokenizer = AutoTokenizer.from_pretrained(self.model_id)
120
+ self._model = AutoModel.from_pretrained(self.model_id)
121
+ self._model.eval()
122
+ self.notes.append(f"Loaded pretrained encoder `{self.model_id}` for inference.")
123
+ except Exception as exc:
124
+ self._load_error = f"{type(exc).__name__}: {exc}"
125
+
126
+ if self._load_error:
127
+ self.backend_name = self._fallback.backend_name
128
+ self.notes = list(self._fallback.notes) + [f"Pretrained load failed: {self._load_error}"]
129
+
130
+ def embed_texts(self, texts: Sequence[str]) -> torch.Tensor:
131
+ self._ensure_loaded()
132
+ if self._model is None or self._tokenizer is None:
133
+ return self._fallback.embed_texts(texts)
134
+
135
+ encoded = self._tokenizer(
136
+ list(texts),
137
+ padding=True,
138
+ truncation=True,
139
+ max_length=MODEL_MAX_LENGTH,
140
+ return_tensors="pt",
141
+ )
142
+ with torch.no_grad():
143
+ outputs = self._model(**encoded)
144
+ hidden_state = outputs.last_hidden_state
145
+ mask = encoded["attention_mask"].unsqueeze(-1)
146
+ pooled = (hidden_state * mask).sum(dim=1) / mask.sum(dim=1).clamp(min=1)
147
+ return F.normalize(pooled, dim=1)
148
+
149
+
150
+ def _sanitize_text(value: str) -> str:
151
+ text = (value or "").strip()
152
+ return text[:4000]
153
+
154
+
155
+ def _safe_softmax(scores: dict[IssueLabel, float]) -> dict[str, float]:
156
+ tensor = torch.tensor([scores[label] for label in LABELS], dtype=torch.float32)
157
+ probabilities = torch.softmax(tensor * 4.0, dim=0)
158
+ return {label: round(float(probabilities[index]), 4) for index, label in enumerate(LABELS)}
159
+
160
+
161
+ def _loop_depth(code: str) -> int:
162
+ try:
163
+ tree = ast.parse(code)
164
+ except SyntaxError:
165
+ return 0
166
+ visitor = _LoopDepthVisitor()
167
+ visitor.visit(tree)
168
+ return visitor.max_depth
169
+
170
+
171
+ def _repair_risk(label: IssueLabel, confidence: float, signal_count: int) -> str:
172
+ base = {"syntax": 0.25, "logic": 0.55, "performance": 0.7}[label]
173
+ if confidence < 0.55:
174
+ base += 0.12
175
+ if signal_count >= 4:
176
+ base += 0.08
177
+ if base < 0.4:
178
+ return "low"
179
+ if base < 0.72:
180
+ return "medium"
181
+ return "high"
182
+
183
+
184
+ def _clamp_unit(value: float) -> float:
185
+ return round(max(0.0, min(1.0, float(value))), 4)
186
+
187
+
188
+ def _lint_score(code: str) -> float:
189
+ stripped_lines = [line.rstrip("\n") for line in code.splitlines()]
190
+ if not stripped_lines:
191
+ return 0.2
192
+
193
+ score = 1.0
194
+ if any(len(line) > 88 for line in stripped_lines):
195
+ score -= 0.15
196
+ if any(line.rstrip() != line for line in stripped_lines):
197
+ score -= 0.1
198
+ if any("\t" in line for line in stripped_lines):
199
+ score -= 0.1
200
+ try:
201
+ tree = ast.parse(code)
202
+ functions = [node for node in tree.body if isinstance(node, ast.FunctionDef)]
203
+ if functions and not ast.get_docstring(functions[0]):
204
+ score -= 0.08
205
+ except SyntaxError:
206
+ score -= 0.45
207
+ return _clamp_unit(score)
208
+
209
+
210
+ def _complexity_penalty(code: str) -> float:
211
+ try:
212
+ tree = ast.parse(code)
213
+ except SyntaxError:
214
+ return 0.95
215
+ branch_nodes = sum(isinstance(node, (ast.If, ast.For, ast.While, ast.Try, ast.Match)) for node in ast.walk(tree))
216
+ loop_depth = _loop_depth(code)
217
+ penalty = 0.1 + min(branch_nodes, 8) * 0.07 + min(loop_depth, 4) * 0.12
218
+ return _clamp_unit(penalty)
219
+
220
+
221
+ class CodeTriageEngine:
222
+ """Combine static signals with PyTorch embeddings to classify code issues."""
223
+
224
+ def __init__(
225
+ self,
226
+ *,
227
+ backend: TransformersEmbeddingBackend | HashingEmbeddingBackend | None = None,
228
+ prototypes: Sequence[TriagePrototype] | None = None,
229
+ examples: Sequence[TriageExample] | None = None,
230
+ ) -> None:
231
+ self.backend = backend or TransformersEmbeddingBackend()
232
+ self.prototypes = list(prototypes or build_prototypes())
233
+ self.examples = list(examples or build_examples())
234
+ self._prototype_matrix: torch.Tensor | None = None
235
+ self._reference_code_matrix: torch.Tensor | None = None
236
+
237
+ def example_map(self) -> dict[str, TriageExample]:
238
+ """Return UI examples keyed by task id."""
239
+
240
+ return {example.key: example for example in self.examples}
241
+
242
+ def _build_document(self, code: str, traceback_text: str) -> str:
243
+ trace = _sanitize_text(traceback_text) or "No traceback supplied."
244
+ snippet = _sanitize_text(code) or "# No code supplied."
245
+ return f"Candidate code:\n{snippet}\n\nObserved failure:\n{trace}\n"
246
+
247
+ def _build_review_document(self, code: str, traceback_text: str, context_window: str) -> str:
248
+ context = _sanitize_text(context_window) or "No additional context window supplied."
249
+ return (
250
+ f"{self._build_document(code, traceback_text)}\n"
251
+ f"Context window:\n{context}\n"
252
+ )
253
+
254
+ def _prototype_embeddings(self) -> torch.Tensor:
255
+ if self._prototype_matrix is None:
256
+ reference_texts = [prototype.reference_text for prototype in self.prototypes]
257
+ self._prototype_matrix = self.backend.embed_texts(reference_texts)
258
+ return self._prototype_matrix
259
+
260
+ def _reference_code_embeddings(self) -> torch.Tensor:
261
+ if self._reference_code_matrix is None:
262
+ reference_codes = [prototype.reference_code for prototype in self.prototypes]
263
+ self._reference_code_matrix = self.backend.embed_texts(reference_codes)
264
+ return self._reference_code_matrix
265
+
266
+ def _extract_signals(self, code: str, traceback_text: str) -> tuple[list[TriageSignal], dict[IssueLabel, float], list[str]]:
267
+ trace = (traceback_text or "").lower()
268
+ heuristic_scores: dict[IssueLabel, float] = {label: 0.15 for label in LABELS}
269
+ signals: list[TriageSignal] = []
270
+ notes: list[str] = []
271
+
272
+ try:
273
+ ast.parse(code)
274
+ signals.append(
275
+ TriageSignal(
276
+ name="syntax_parse",
277
+ value="passes",
278
+ impact="syntax",
279
+ weight=0.1,
280
+ evidence="Python AST parsing succeeded.",
281
+ )
282
+ )
283
+ heuristic_scores["logic"] += 0.05
284
+ except SyntaxError as exc:
285
+ evidence = f"{exc.msg} at line {exc.lineno}"
286
+ signals.append(
287
+ TriageSignal(
288
+ name="syntax_parse",
289
+ value="fails",
290
+ impact="syntax",
291
+ weight=0.95,
292
+ evidence=evidence,
293
+ )
294
+ )
295
+ heuristic_scores["syntax"] += 0.85
296
+ notes.append(f"Parser failure detected: {evidence}")
297
+
298
+ if any(token in trace for token in ("syntaxerror", "indentationerror", "expected ':'")):
299
+ signals.append(
300
+ TriageSignal(
301
+ name="traceback_keyword",
302
+ value="syntaxerror",
303
+ impact="syntax",
304
+ weight=0.8,
305
+ evidence="Traceback contains a parser error.",
306
+ )
307
+ )
308
+ heuristic_scores["syntax"] += 0.55
309
+
310
+ if any(token in trace for token in ("assertionerror", "expected:", "actual:", "boundary", "missing", "incorrect")):
311
+ signals.append(
312
+ TriageSignal(
313
+ name="test_failure_signal",
314
+ value="assertion-style failure",
315
+ impact="logic",
316
+ weight=0.7,
317
+ evidence="Failure text points to behavioral mismatch instead of parser issues.",
318
+ )
319
+ )
320
+ heuristic_scores["logic"] += 0.55
321
+
322
+ if any(token in trace for token in ("timeout", "benchmark", "slow", "latency", "performance", "profiler")):
323
+ signals.append(
324
+ TriageSignal(
325
+ name="performance_trace",
326
+ value="latency regression",
327
+ impact="performance",
328
+ weight=0.85,
329
+ evidence="Traceback mentions benchmark or latency pressure.",
330
+ )
331
+ )
332
+ heuristic_scores["performance"] += 0.7
333
+
334
+ loop_depth = _loop_depth(code)
335
+ if loop_depth >= 2:
336
+ signals.append(
337
+ TriageSignal(
338
+ name="loop_depth",
339
+ value=str(loop_depth),
340
+ impact="performance",
341
+ weight=0.65,
342
+ evidence="Nested iteration increases runtime risk on larger fixtures.",
343
+ )
344
+ )
345
+ heuristic_scores["performance"] += 0.35
346
+
347
+ if "Counter(" in code or "defaultdict(" in code or "set(" in code:
348
+ heuristic_scores["performance"] += 0.05
349
+
350
+ if "return sessions" in code and "sessions.append" not in code:
351
+ signals.append(
352
+ TriageSignal(
353
+ name="state_update_gap",
354
+ value="possible missing final append",
355
+ impact="logic",
356
+ weight=0.45,
357
+ evidence="A collection is returned without an obvious final state flush.",
358
+ )
359
+ )
360
+ heuristic_scores["logic"] += 0.18
361
+
362
+ return signals, heuristic_scores, notes
363
+
364
+ def _nearest_match(self, embedding: torch.Tensor) -> tuple[TriagePrototype, float, dict[str, float]]:
365
+ similarities = torch.matmul(embedding, self._prototype_embeddings().T)[0]
366
+ indexed_scores = {
367
+ self.prototypes[index].task_id: round(float((similarities[index] + 1.0) / 2.0), 4)
368
+ for index in range(len(self.prototypes))
369
+ }
370
+ best_index = int(torch.argmax(similarities).item())
371
+ best_prototype = self.prototypes[best_index]
372
+ best_similarity = float((similarities[best_index] + 1.0) / 2.0)
373
+ return best_prototype, best_similarity, indexed_scores
374
+
375
+ def _repair_plan(self, label: IssueLabel, matched: TriagePrototype, context_window: str) -> list[str]:
376
+ context = _sanitize_text(context_window)
377
+ step_one = {
378
+ "syntax": "Step 1 - Syntax checking and bug fixes: resolve the parser break before touching behavior, then align the function with the expected contract.",
379
+ "logic": "Step 1 - Syntax checking and bug fixes: confirm the code parses cleanly, then patch the failing branch or state update causing the incorrect result.",
380
+ "performance": "Step 1 - Syntax checking and bug fixes: keep the implementation correct first, then isolate the slow section without changing external behavior.",
381
+ }[label]
382
+ step_two = (
383
+ "Step 2 - Edge case handling: verify empty input, boundary values, missing fields, and final-state flush behavior "
384
+ f"against the known pattern `{matched.title}`."
385
+ )
386
+ step_three = (
387
+ "Step 3 - Scalability of code: remove repeated full scans, prefer linear-time data structures, "
388
+ "and benchmark the path on a production-like fixture."
389
+ )
390
+ if context:
391
+ step_two = f"{step_two} Context window to preserve: {context}"
392
+ return [step_one, step_two, step_three]
393
+
394
+ def _reference_quality_score(self, code: str, matched: TriagePrototype) -> float:
395
+ candidate = self.backend.embed_texts([_sanitize_text(code) or "# empty"])
396
+ match_index = next(index for index, prototype in enumerate(self.prototypes) if prototype.task_id == matched.task_id)
397
+ reference = self._reference_code_embeddings()[match_index : match_index + 1]
398
+ score = float(torch.matmul(candidate, reference.T)[0][0].item())
399
+ return _clamp_unit((score + 1.0) / 2.0)
400
+
401
+ def triage(self, code: str, traceback_text: str = "", context_window: str = "") -> TriageResult:
402
+ """Run the full triage pipeline on code plus optional failure context."""
403
+
404
+ started = time.perf_counter()
405
+ document = self._build_review_document(code, traceback_text, context_window)
406
+ signals, heuristic_scores, notes = self._extract_signals(code, traceback_text)
407
+
408
+ candidate_embedding = self.backend.embed_texts([document])
409
+ matched, matched_similarity, prototype_scores = self._nearest_match(candidate_embedding)
410
+
411
+ label_similarity = {label: 0.18 for label in LABELS}
412
+ for prototype in self.prototypes:
413
+ label_similarity[prototype.label] = max(
414
+ label_similarity[prototype.label],
415
+ prototype_scores[prototype.task_id],
416
+ )
417
+
418
+ combined_scores = {
419
+ label: 0.72 * label_similarity[label] + 0.28 * heuristic_scores[label]
420
+ for label in LABELS
421
+ }
422
+ confidence_scores = _safe_softmax(combined_scores)
423
+ issue_label = max(LABELS, key=lambda label: confidence_scores[label])
424
+ top_confidence = confidence_scores[issue_label]
425
+
426
+ top_signal = signals[0].evidence if signals else "Model similarity dominated the decision."
427
+ ml_quality_score = self._reference_quality_score(code, matched)
428
+ lint_score = _lint_score(code)
429
+ complexity_penalty = _complexity_penalty(code)
430
+ reward_score = _clamp_unit((0.5 * ml_quality_score) + (0.3 * lint_score) - (0.2 * complexity_penalty))
431
+ summary = (
432
+ f"Detected a {issue_label} issue with {top_confidence:.0%} confidence. "
433
+ f"The closest known failure pattern is `{matched.title}`, which indicates {matched.summary.lower()}. "
434
+ f"Predicted quality score is {ml_quality_score:.0%} with an RL-ready reward of {reward_score:.0%}."
435
+ )
436
+ suggested_next_action = {
437
+ "syntax": "Fix the parser error first, then rerun validation before changing behavior.",
438
+ "logic": "Step through the smallest failing case and confirm the final branch/update behavior.",
439
+ "performance": "Replace repeated full-list scans with a linear-time aggregation strategy, then benchmark it.",
440
+ }[issue_label]
441
+
442
+ return TriageResult(
443
+ issue_label=issue_label,
444
+ confidence_scores=confidence_scores,
445
+ repair_risk=_repair_risk(issue_label, top_confidence, len(signals)),
446
+ ml_quality_score=ml_quality_score,
447
+ lint_score=lint_score,
448
+ complexity_penalty=complexity_penalty,
449
+ reward_score=reward_score,
450
+ summary=summary,
451
+ matched_pattern=PrototypeMatch(
452
+ task_id=matched.task_id,
453
+ title=matched.title,
454
+ label=matched.label,
455
+ similarity=round(matched_similarity, 4),
456
+ summary=matched.summary,
457
+ rationale=top_signal,
458
+ ),
459
+ repair_plan=self._repair_plan(issue_label, matched, context_window),
460
+ suggested_next_action=suggested_next_action,
461
+ extracted_signals=signals,
462
+ model_backend=self.backend.backend_name,
463
+ model_id=self.backend.model_id,
464
+ inference_notes=list(self.backend.notes) + notes,
465
+ analysis_time_ms=round((time.perf_counter() - started) * 1000.0, 2),
466
+ )
467
+
468
+
469
+ @lru_cache(maxsize=1)
470
+ def get_default_engine() -> CodeTriageEngine:
471
+ """Return a cached triage engine for the running process."""
472
+
473
+ return CodeTriageEngine()
triage_catalog.py ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Curated prototypes and example inputs for TorchReview Copilot."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Dict, List
6
+
7
+ try:
8
+ from .triage_models import IssueLabel, TriageExample, TriagePrototype
9
+ from .tasks import list_tasks
10
+ except ImportError:
11
+ from triage_models import IssueLabel, TriageExample, TriagePrototype
12
+ from tasks import list_tasks
13
+
14
+
15
+ TASK_KIND_TO_LABEL: Dict[str, IssueLabel] = {
16
+ "syntax_fix": "syntax",
17
+ "bug_fix": "logic",
18
+ "optimization": "performance",
19
+ }
20
+
21
+ TRACEBACK_BY_TASK_ID: Dict[str, str] = {
22
+ "syntax_fix_invoice_totals": (
23
+ "Traceback (most recent call last):\n"
24
+ " File \"services/billing/reconciliation.py\", line 3\n"
25
+ " for record in records\n"
26
+ " ^\n"
27
+ "SyntaxError: expected ':'"
28
+ ),
29
+ "bug_fix_session_windows": (
30
+ "AssertionError: collapse_sessions([{'minute': 1}, {'minute': 3}, {'minute': 8}], 4)\n"
31
+ "Expected: [(1, 3), (8, 8)]\n"
32
+ "Actual: [(1, 8)]\n"
33
+ "Boundary handling merges the final session instead of starting a new one."
34
+ ),
35
+ "optimization_rank_active_users": (
36
+ "BenchmarkWarning: rank_active_users exceeded the 450ms budget on a nightly export fixture.\n"
37
+ "Profiler hint: repeated scans over the full event list and nested loops dominate runtime."
38
+ ),
39
+ }
40
+
41
+ SUMMARY_BY_TASK_ID: Dict[str, str] = {
42
+ "syntax_fix_invoice_totals": "Broken parser state in a billing helper blocks reconciliation jobs.",
43
+ "bug_fix_session_windows": "Session-boundary logic fails on inclusive idle-timeout edges.",
44
+ "optimization_rank_active_users": "A nightly ranking job is correct on small fixtures but too slow at production scale.",
45
+ }
46
+
47
+ CONTEXT_BY_TASK_ID: Dict[str, str] = {
48
+ "syntax_fix_invoice_totals": (
49
+ "Context window: this helper runs in an end-of-day billing reconciliation job. "
50
+ "Keep the public function signature intact and restore correct totals for mixed integer/string inputs."
51
+ ),
52
+ "bug_fix_session_windows": (
53
+ "Context window: this function groups sorted product analytics events into sessions for retention dashboards. "
54
+ "Boundary behavior must stay deterministic because downstream reports depend on it."
55
+ ),
56
+ "optimization_rank_active_users": (
57
+ "Context window: this pipeline feeds a nightly export on a small CPU instance. "
58
+ "Maintain identical output ordering while improving scalability on larger event volumes."
59
+ ),
60
+ }
61
+
62
+
63
+ def _prototype_text(
64
+ task_id: str,
65
+ title: str,
66
+ description: str,
67
+ repo_summary: str,
68
+ goal: str,
69
+ visible_tests: List[str],
70
+ starter_code: str,
71
+ traceback_text: str,
72
+ ) -> str:
73
+ visible = "\n".join(f"- {item}" for item in visible_tests) or "- none"
74
+ return (
75
+ f"Title: {title}\n"
76
+ f"Problem: {description}\n"
77
+ f"Repo context: {repo_summary}\n"
78
+ f"Goal: {goal}\n"
79
+ f"Observed failure:\n{traceback_text}\n"
80
+ f"Visible checks:\n{visible}\n"
81
+ f"Candidate code:\n{starter_code}\n"
82
+ f"Task id: {task_id}\n"
83
+ )
84
+
85
+
86
+ def build_examples() -> List[TriageExample]:
87
+ """Create stable UI examples from the task catalog."""
88
+
89
+ examples: List[TriageExample] = []
90
+ for task in list_tasks():
91
+ label = TASK_KIND_TO_LABEL[task.task_kind]
92
+ examples.append(
93
+ TriageExample(
94
+ key=task.task_id,
95
+ title=task.title,
96
+ label=label,
97
+ summary=SUMMARY_BY_TASK_ID[task.task_id],
98
+ code=task.starter_code,
99
+ traceback_text=TRACEBACK_BY_TASK_ID[task.task_id],
100
+ context_window=CONTEXT_BY_TASK_ID[task.task_id],
101
+ task_id=task.task_id,
102
+ )
103
+ )
104
+ return examples
105
+
106
+
107
+ def build_prototypes() -> List[TriagePrototype]:
108
+ """Build canonical triage prototypes from the OpenEnv tasks."""
109
+
110
+ prototypes: List[TriagePrototype] = []
111
+ for task in list_tasks():
112
+ traceback_text = TRACEBACK_BY_TASK_ID[task.task_id]
113
+ prototypes.append(
114
+ TriagePrototype(
115
+ task_id=task.task_id,
116
+ title=task.title,
117
+ label=TASK_KIND_TO_LABEL[task.task_kind],
118
+ summary=SUMMARY_BY_TASK_ID[task.task_id],
119
+ reference_text=_prototype_text(
120
+ task.task_id,
121
+ task.title,
122
+ task.task_description,
123
+ task.repo_summary,
124
+ task.goal,
125
+ list(task.visible_tests),
126
+ task.reference_code,
127
+ traceback_text,
128
+ ),
129
+ starter_code=task.starter_code,
130
+ reference_code=task.reference_code,
131
+ traceback_text=traceback_text,
132
+ )
133
+ )
134
+ return prototypes
triage_models.py ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Typed models for TorchReview Copilot outputs and examples."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Dict, List, Literal
6
+
7
+ from pydantic import BaseModel, Field
8
+
9
+
10
+ IssueLabel = Literal["syntax", "logic", "performance"]
11
+ RiskLevel = Literal["low", "medium", "high"]
12
+
13
+
14
+ class TriageSignal(BaseModel):
15
+ """One extracted signal used during issue classification."""
16
+
17
+ name: str
18
+ value: str
19
+ impact: Literal["syntax", "logic", "performance", "mixed"] = "mixed"
20
+ weight: float = Field(..., ge=0.0, le=1.0)
21
+ evidence: str = ""
22
+
23
+
24
+ class PrototypeMatch(BaseModel):
25
+ """Nearest known bug pattern from the built-in task catalog."""
26
+
27
+ task_id: str
28
+ title: str
29
+ label: IssueLabel
30
+ similarity: float = Field(..., ge=0.0, le=1.0)
31
+ summary: str
32
+ rationale: str
33
+
34
+
35
+ class TriageExample(BaseModel):
36
+ """Example payload exposed in the demo UI."""
37
+
38
+ key: str
39
+ title: str
40
+ label: IssueLabel
41
+ summary: str
42
+ code: str
43
+ traceback_text: str
44
+ context_window: str
45
+ task_id: str
46
+
47
+
48
+ class TriagePrototype(BaseModel):
49
+ """Canonical issue-pattern representation embedded by the triage engine."""
50
+
51
+ task_id: str
52
+ title: str
53
+ label: IssueLabel
54
+ summary: str
55
+ reference_text: str
56
+ starter_code: str
57
+ reference_code: str
58
+ traceback_text: str
59
+
60
+
61
+ class TriageResult(BaseModel):
62
+ """Structured output produced by the triage pipeline."""
63
+
64
+ issue_label: IssueLabel
65
+ confidence_scores: Dict[str, float]
66
+ repair_risk: RiskLevel
67
+ ml_quality_score: float = Field(..., ge=0.0, le=1.0)
68
+ lint_score: float = Field(..., ge=0.0, le=1.0)
69
+ complexity_penalty: float = Field(..., ge=0.0, le=1.0)
70
+ reward_score: float = Field(..., ge=0.0, le=1.0)
71
+ summary: str
72
+ matched_pattern: PrototypeMatch
73
+ repair_plan: List[str]
74
+ suggested_next_action: str
75
+ extracted_signals: List[TriageSignal] = Field(default_factory=list)
76
+ model_backend: str
77
+ model_id: str
78
+ inference_notes: List[str] = Field(default_factory=list)
79
+ analysis_time_ms: float = Field(..., ge=0.0)
utils/__init__.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ """Utility helpers for AST parsing and complexity scoring."""
2
+
3
+ from .ast_parser import parse_code_structure
4
+ from .complexity import estimate_complexity
5
+
6
+ __all__ = ["parse_code_structure", "estimate_complexity"]
utils/ast_parser.py ADDED
@@ -0,0 +1,144 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Static parsing helpers for multi-domain Python code analysis."""
2
+
3
+ from __future__ import annotations
4
+
5
+ import ast
6
+ from typing import Any, Dict, List
7
+
8
+
9
+ class _LoopDepthVisitor(ast.NodeVisitor):
10
+ """Collect loop nesting depth for a parsed Python module."""
11
+
12
+ def __init__(self) -> None:
13
+ self.depth = 0
14
+ self.max_depth = 0
15
+
16
+ def _visit_loop(self, node: ast.AST) -> None:
17
+ self.depth += 1
18
+ self.max_depth = max(self.max_depth, self.depth)
19
+ self.generic_visit(node)
20
+ self.depth -= 1
21
+
22
+ def visit_For(self, node: ast.For) -> None: # noqa: N802
23
+ self._visit_loop(node)
24
+
25
+ def visit_While(self, node: ast.While) -> None: # noqa: N802
26
+ self._visit_loop(node)
27
+
28
+ def visit_comprehension(self, node: ast.comprehension) -> None: # noqa: N802
29
+ self._visit_loop(node)
30
+
31
+
32
+ def parse_code_structure(code: str) -> Dict[str, Any]:
33
+ """Parse Python code into reusable structural signals."""
34
+
35
+ summary: Dict[str, Any] = {
36
+ "syntax_valid": True,
37
+ "syntax_error": "",
38
+ "imports": [],
39
+ "function_names": [],
40
+ "class_names": [],
41
+ "loop_count": 0,
42
+ "branch_count": 0,
43
+ "max_loop_depth": 0,
44
+ "line_count": len(code.splitlines()),
45
+ "long_lines": 0,
46
+ "tabs_used": "\t" in code,
47
+ "trailing_whitespace_lines": 0,
48
+ "uses_numpy": False,
49
+ "uses_pandas": False,
50
+ "uses_torch": False,
51
+ "uses_sklearn": False,
52
+ "uses_fastapi": False,
53
+ "uses_flask": False,
54
+ "uses_pydantic": False,
55
+ "uses_recursion": False,
56
+ "calls_eval": False,
57
+ "calls_no_grad": False,
58
+ "calls_backward": False,
59
+ "calls_optimizer_step": False,
60
+ "route_decorators": [],
61
+ "docstring_ratio": 0.0,
62
+ "code_smells": [],
63
+ }
64
+
65
+ lines = code.splitlines()
66
+ summary["long_lines"] = sum(1 for line in lines if len(line) > 88)
67
+ summary["trailing_whitespace_lines"] = sum(1 for line in lines if line.rstrip() != line)
68
+
69
+ try:
70
+ tree = ast.parse(code)
71
+ except SyntaxError as exc:
72
+ summary["syntax_valid"] = False
73
+ summary["syntax_error"] = f"{exc.msg} (line {exc.lineno})"
74
+ summary["code_smells"].append("Code does not parse.")
75
+ return summary
76
+
77
+ visitor = _LoopDepthVisitor()
78
+ visitor.visit(tree)
79
+ summary["max_loop_depth"] = visitor.max_depth
80
+
81
+ functions = [node for node in tree.body if isinstance(node, ast.FunctionDef)]
82
+ summary["function_names"] = [node.name for node in functions]
83
+ summary["class_names"] = [node.name for node in tree.body if isinstance(node, ast.ClassDef)]
84
+ summary["docstring_ratio"] = (
85
+ sum(1 for node in functions if ast.get_docstring(node)) / len(functions)
86
+ if functions
87
+ else 0.0
88
+ )
89
+
90
+ imports: List[str] = []
91
+ for node in ast.walk(tree):
92
+ if isinstance(node, ast.Import):
93
+ imports.extend(alias.name.split(".")[0] for alias in node.names)
94
+ elif isinstance(node, ast.ImportFrom) and node.module:
95
+ imports.append(node.module.split(".")[0])
96
+ elif isinstance(node, (ast.For, ast.While, ast.comprehension)):
97
+ summary["loop_count"] += 1
98
+ elif isinstance(node, (ast.If, ast.Try, ast.Match)):
99
+ summary["branch_count"] += 1
100
+ elif isinstance(node, ast.Call) and isinstance(node.func, ast.Attribute):
101
+ attr = node.func.attr
102
+ if attr == "eval":
103
+ summary["calls_eval"] = True
104
+ elif attr == "backward":
105
+ summary["calls_backward"] = True
106
+ elif attr == "step":
107
+ summary["calls_optimizer_step"] = True
108
+ elif isinstance(node, ast.Call) and isinstance(node.func, ast.Name) and node.func.id == "print":
109
+ summary["code_smells"].append("Debug print statements are present.")
110
+ elif isinstance(node, ast.With):
111
+ if any(isinstance(item.context_expr, ast.Call) and isinstance(item.context_expr.func, ast.Attribute) and item.context_expr.func.attr == "no_grad" for item in node.items):
112
+ summary["calls_no_grad"] = True
113
+
114
+ import_set = sorted(set(imports))
115
+ summary["imports"] = import_set
116
+ summary["uses_numpy"] = "numpy" in import_set or "np" in code
117
+ summary["uses_pandas"] = "pandas" in import_set or "pd" in code
118
+ summary["uses_torch"] = "torch" in import_set
119
+ summary["uses_sklearn"] = "sklearn" in import_set
120
+ summary["uses_fastapi"] = "fastapi" in import_set
121
+ summary["uses_flask"] = "flask" in import_set
122
+ summary["uses_pydantic"] = "pydantic" in import_set or "BaseModel" in code
123
+
124
+ for node in functions:
125
+ for child in ast.walk(node):
126
+ if isinstance(child, ast.Call) and isinstance(child.func, ast.Name) and child.func.id == node.name:
127
+ summary["uses_recursion"] = True
128
+
129
+ for node in ast.walk(tree):
130
+ if isinstance(node, ast.FunctionDef):
131
+ for decorator in node.decorator_list:
132
+ if isinstance(decorator, ast.Call) and isinstance(decorator.func, ast.Attribute):
133
+ summary["route_decorators"].append(decorator.func.attr)
134
+ elif isinstance(decorator, ast.Attribute):
135
+ summary["route_decorators"].append(decorator.attr)
136
+
137
+ if summary["long_lines"]:
138
+ summary["code_smells"].append("Long lines reduce readability.")
139
+ if summary["tabs_used"]:
140
+ summary["code_smells"].append("Tabs detected; prefer spaces for consistency.")
141
+ if summary["trailing_whitespace_lines"]:
142
+ summary["code_smells"].append("Trailing whitespace found.")
143
+
144
+ return summary
utils/complexity.py ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Complexity heuristics for DSA-style and general Python code."""
2
+
3
+ from __future__ import annotations
4
+
5
+ from typing import Any, Dict
6
+
7
+
8
+ def estimate_complexity(parsed: Dict[str, Any], code: str) -> Dict[str, Any]:
9
+ """Estimate cyclomatic complexity and rough Big-O heuristics."""
10
+
11
+ cyclomatic = 1 + int(parsed.get("branch_count", 0))
12
+ loop_depth = int(parsed.get("max_loop_depth", 0))
13
+ uses_recursion = bool(parsed.get("uses_recursion", False))
14
+
15
+ if loop_depth >= 3:
16
+ time_complexity = "O(n^3)"
17
+ elif loop_depth == 2:
18
+ time_complexity = "O(n^2)"
19
+ elif "sorted(" in code or ".sort(" in code:
20
+ time_complexity = "O(n log n)"
21
+ elif loop_depth == 1 or uses_recursion:
22
+ time_complexity = "O(n)"
23
+ else:
24
+ time_complexity = "O(1)"
25
+
26
+ if "append(" in code or "list(" in code or "dict(" in code or "set(" in code:
27
+ space_complexity = "O(n)"
28
+ else:
29
+ space_complexity = "O(1)"
30
+
31
+ complexity_penalty = min(0.99, 0.08 + (cyclomatic * 0.04) + (loop_depth * 0.12))
32
+ return {
33
+ "cyclomatic_complexity": cyclomatic,
34
+ "time_complexity": time_complexity,
35
+ "space_complexity": space_complexity,
36
+ "complexity_penalty": round(complexity_penalty, 4),
37
+ }