Spaces:

uvpatel7271
/

final-python-env

Sleeping

App Files Files Community

uvpatel7271 commited on 14 days ago

Commit

a83cb85

verified ·

1 Parent(s): 4451363

Upload folder using huggingface_hub

Browse files

Files changed (45) hide show

DEMO_SCRIPT.md +12 -0
Dockerfile +10 -3
README.md +34 -160
__init__.py +15 -1
analyzers/__init__.py +13 -0
analyzers/ds_analyzer.py +56 -0
analyzers/dsa_analyzer.py +48 -0
analyzers/ml_analyzer.py +61 -0
analyzers/web_analyzer.py +50 -0
api/__init__.py +5 -0
api/main.py +27 -0
app/__init__.py +1 -0
app/examples.py +31 -0
app/streamlit_app.py +100 -0
client.py +1 -1
graders/bug_fix.py +2 -2
graders/dispatch.py +2 -2
graders/optimization.py +2 -2
graders/shared.py +2 -2
graders/syntax.py +2 -2
inference.py +1 -1
launch.py +35 -0
models/__init__.py +5 -0
models/pytorch_model.py +149 -0
pyproject.toml +13 -2
schemas/__init__.py +13 -0
schemas/request.py +19 -0
schemas/response.py +70 -0
server/app.py +26 -10
server/demo.py +441 -0
server/env.py +2 -2
server/requirements.txt +4 -0
services/__init__.py +7 -0
services/analysis_service.py +133 -0
services/reward_service.py +27 -0
services/suggestion_service.py +28 -0
tests/test_multi_domain_platform.py +52 -0
tests/test_scoring.py +1 -1
tests/test_triage_pipeline.py +46 -0
triage.py +473 -0
triage_catalog.py +134 -0
triage_models.py +79 -0
utils/__init__.py +6 -0
utils/ast_parser.py +144 -0
utils/complexity.py +37 -0

DEMO_SCRIPT.md ADDED Viewed

	@@ -0,0 +1,12 @@

+# TorchReview Copilot Demo Script
+## 60-90 Second Walkthrough
+1. Open the Hugging Face Space and introduce TorchReview Copilot as an AI-powered code review and improvement system built with PyTorch.
+2. Point to the problem statement: manual code review is slow, inconsistent, and hard to scale.
+3. Select the `Fix the invoice total syntax regression` example to show the app loading a broken code sample together with the context window.
+4. Highlight the **Live Triage Radar**, the ML quality score, and the RL-ready reward score.
+5. Explain that the PyTorch layer uses CodeBERTa embeddings to compare the input against known code-quality patterns from the OpenEnv task catalog.
+6. Scroll to the three-step improvement plan and call out the progression: syntax and bug fixes, edge cases, then scalability.
+7. Switch to the performance example to show the confidence profile and reward changing for a different class of issue.
+8. Close by noting that OpenEnv still powers deterministic validation under the hood, so the demo remains grounded in measurable task outcomes.

Dockerfile CHANGED Viewed

@@ -6,9 +6,16 @@ ENV PYTHONDONTWRITEBYTECODE=1 \
 WORKDIR /app
-COPY pyproject.toml README.md openenv.yaml __init__.py client.py compat.py models.py inference.py /app/
 COPY server /app/server
 COPY tasks /app/tasks
 COPY graders /app/graders
 RUN python -m pip install --upgrade pip && \
@@ -17,7 +24,7 @@ RUN python -m pip install --upgrade pip && \
 EXPOSE 8000
 HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
-    CMD python -c "import urllib.request; urllib.request.urlopen('http://127.0.0.1:8000/health', timeout=3).read()"
 ENV ENABLE_WEB_INTERFACE=true
-CMD ["uvicorn", "server.app:app", "--host", "0.0.0.0", "--port", "8000"]

 WORKDIR /app
+COPY pyproject.toml README.md DEMO_SCRIPT.md openenv.yaml __init__.py client.py compat.py openenv_models.py inference.py triage.py triage_catalog.py triage_models.py launch.py /app/
+COPY api /app/api
+COPY app /app/app
+COPY analyzers /app/analyzers
+COPY models /app/models
+COPY schemas /app/schemas
 COPY server /app/server
+COPY services /app/services
 COPY tasks /app/tasks
+COPY utils /app/utils
 COPY graders /app/graders
 RUN python -m pip install --upgrade pip && \
 EXPOSE 8000
 HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
+    CMD python -c "import urllib.request; urllib.request.urlopen('http://127.0.0.1:8000', timeout=3).read()"
 ENV ENABLE_WEB_INTERFACE=true
+CMD ["python", "launch.py"]

README.md CHANGED Viewed

@@ -1,189 +1,63 @@
 ---
-title: Python Code Review Environment
 colorFrom: yellow
-colorTo: blue
 sdk: docker
 pinned: false
 app_port: 8000
 tags:
   - openenv
   - code-review
-  - python
 base_path: /web
 ---
-# python_code_review_env
-`python_code_review_env` is a production-style OpenEnv environment that simulates a realistic Python code review workflow. An agent inspects broken code, edits it, runs tests, and submits a final solution against deterministic graders for syntax repair, bug fixing, and optimization/refactoring.
-## Environment design
-- `Observation` includes task instructions, current code, syntax errors, public test output, action history, and remaining attempts.
-- `Action` is structured as `analyze_code`, `edit_code`, `run_tests`, or `submit_solution`.
-- `Reward` is shaped and non-binary. The environment awards syntax progress, test progress, correctness, and quality improvements while penalizing invalid actions, timeouts, regressions, and unchanged edits.
-- `State` exposes the internal episode snapshot through `/state`.
-## Task set
-1. `syntax_fix_invoice_totals` (easy)
-   Fix a syntax regression in an invoice normalization helper.
-2. `bug_fix_session_windows` (medium)
-   Repair a session-collapsing bug using deterministic public and hidden tests.
-3. `optimization_rank_active_users` (hard)
-   Refactor a slow ranking function and earn additional score from runtime improvement plus AST/style quality.
-## Action schema
-```json
-{
-  "action_type": "edit_code",
-  "code": "def function(...):\n    ..."
-}
-```
-Supported `action_type` values:
-- `analyze_code`
-- `edit_code`
-- `run_tests`
-- `submit_solution`
-## Observation schema
-```json
-{
-  "task_description": "...",
-  "current_code": "...",
-  "errors": "...",
-  "test_results": "...",
-  "history": []
-}
-```
-The full observation also includes `task_id`, `difficulty`, `task_kind`, `visible_tests`, `attempts_remaining`, `score`, `last_action_status`, `reward`, `done`, and a structured `reward_details` breakdown.
-## Deterministic grading
-- Syntax tasks use `compile()` plus hidden behavioral checks.
-- Bug-fix tasks use deterministic function-call cases that behave like pytest assertions.
-- Optimization tasks combine correctness, runtime benchmarking, and AST/style quality scoring.
-- Infinite loops and long-running solutions are sandboxed with subprocess timeouts and receive penalties.
-- All scores are clamped to `[0.0, 1.0]`.
-## Run locally
-Install dependencies:
-```bash
-pip install .
-```
-Start the API server:
-```bash
-uvicorn server.app:app --host 0.0.0.0 --port 8000
-```
-Smoke-test the environment:
-```bash
-curl http://localhost:8000/health
-curl http://localhost:8000/state
-```
-OpenEnv validation:
-```bash
-openenv validate
-```
-## Docker build
-The Docker image no longer depends on `ghcr.io/meta-pytorch/openenv-base:latest`, which removes the TLS handshake failure from the original build path.
-```bash
-# Run from repo root
-docker build -t python-code-review-env -f server/Dockerfile .
-docker run --rm -p 8000:8000 python-code-review-env
-```
-If you run the build from inside `server/`, you must point the context at the repo root:
-```bash
-docker build -t python-code-review-env -f Dockerfile ..
-```
-Expected health check:
-```bash
-curl http://localhost:8000/health
-```
-## Hugging Face Spaces deployment
-1. Create a Docker Space.
-2. Push this repository content to the Space.
-3. Ensure port `8000` is exposed.
-4. Wait for the container to build.
-5. Verify `/reset` and `/health` return `200`.
-The image is CPU-friendly and designed for a small Hugging Face Space such as `2 vCPU / 8 GB RAM`.
-## Inference baseline
-`inference.py` uses an OpenAI-compatible client:
-```python
-client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
-```
-Supported providers include:
-- Gemini through an OpenAI-compatible gateway
-- OpenRouter
-- Together AI
-- DeepSeek-compatible OpenAI endpoints
-Run it with a free/open provider:
-```bash
-set API_BASE_URL=https://openrouter.ai/api/v1
-set API_KEY=...
-set MODEL=deepseek/deepseek-chat-v3-0324:free
-python inference.py
-```
-If no credentials are supplied, the script falls back to a deterministic smoke-test policy that applies the reference fix for each task so the environment can still be validated end to end.
-Example output:
-```text
-Task 1 Score: 1.0
-Task 2 Score: 1.0
-Task 3 Score: 0.9
-Final Score: 1.0
-```
-## Project structure
 ```text
-python_env/
-├── client.py
-├── graders/
-│   ├── bug_fix.py
-│   ├── dispatch.py
-│   ├── optimization.py
-│   ├── shared.py
-│   └── syntax.py
-├── inference.py
-├── models.py
-├── openenv.yaml
-├── README.md
-├── server/
-│   ├── app.py
-│   ├── Dockerfile
-│   ├── env.py
-│   └── python_env_environment.py
-└── tasks/
-    └── catalog.py
-```

 ---
+title: TorchReview Copilot
+emoji: 🧠
 colorFrom: yellow
+colorTo: red
 sdk: docker
 pinned: false
 app_port: 8000
 tags:
+  - pytorch
+  - gradio
+  - fastapi
   - openenv
   - code-review
 base_path: /web
 ---
+# TorchReview Copilot
+TorchReview Copilot is an **AI-powered code review and improvement system using PyTorch** to analyze Python code, predict quality, generate structured improvement suggestions, and compute an RL-ready reward score.
+It upgrades the original OpenEnv hackathon environment into a judge-friendly product demo: a polished Hugging Face Space on top, with the deterministic OpenEnv validation engine still preserved underneath.
+**Live demo:** https://huggingface.co/spaces/uvpatel7271/final-python-env
+**Repository:** https://github.com/uvpatel/final-python-env
+## Problem Statement
+Engineering teams lose time during incident response and code review because broken Python snippets often arrive with noisy traces, partial test output, and unclear ownership. Before fixing anything, someone still has to answer:
+- Is this a syntax issue, a logic bug, or a performance regression?
+- How risky is the repair?
+- What should be checked first?
+That triage step is repetitive, error-prone, and often slows down the actual fix.
+## Solution
+TorchReview Copilot turns code, traceback text, and a short context window into a practical code-review report:
+- **Issue classification:** syntax, logic, or performance
+- **ML quality score:** predicted code quality from PyTorch embeddings
+- **Reward score:** RL-ready score from model quality, lint quality, and complexity penalty
+- **Live Triage Radar:** confidence visualization for all issue classes
+- **Nearest known pattern:** the closest OpenEnv task match
+- **Improvement plan:** step 1 syntax/bug fixes, step 2 edge cases, step 3 scalability
+## Why PyTorch Matters
+This project uses **PyTorch for real inference**, not placeholder branching:
+- `transformers` + `torch` load `huggingface/CodeBERTa-small-v1`
+- embeddings compare code with OpenEnv issue prototypes
+- combines ML + static analysis signals
+## How It Works
+`Input → static checks → PyTorch embeddings → prediction → suggestions → reward`
+## Reward Formula
 ```text
+reward = (0.5 x ML_quality_score) + (0.3 x lint_score) - (0.2 x complexity_penalty)

__init__.py CHANGED Viewed

@@ -1,7 +1,8 @@
 """Public package exports for python_code_review_env."""
 from .client import PythonCodeReviewEnv, PythonEnv
-from .models import (
     PythonAction,
     PythonCodeReviewAction,
     PythonCodeReviewObservation,
@@ -9,6 +10,10 @@ from .models import (
     PythonObservation,
     PythonState,
 )
 __all__ = [
     "PythonAction",
@@ -19,4 +24,13 @@ __all__ = [
     "PythonCodeReviewState",
     "PythonCodeReviewEnv",
     "PythonEnv",
 ]

 """Public package exports for python_code_review_env."""
 from .client import PythonCodeReviewEnv, PythonEnv
+from .models import PyTorchCodeAnalyzerModel
+from .Models import (
     PythonAction,
     PythonCodeReviewAction,
     PythonCodeReviewObservation,
     PythonObservation,
     PythonState,
 )
+from .schemas import AnalyzeCodeRequest, AnalyzeCodeResponse
+from .services import AnalysisService
+from .triage import CodeTriageEngine, HashingEmbeddingBackend, TransformersEmbeddingBackend, get_default_engine
+from .triage_models import TriageResult
 __all__ = [
     "PythonAction",
     "PythonCodeReviewState",
     "PythonCodeReviewEnv",
     "PythonEnv",
+    "AnalyzeCodeRequest",
+    "AnalyzeCodeResponse",
+    "AnalysisService",
+    "CodeTriageEngine",
+    "HashingEmbeddingBackend",
+    "PyTorchCodeAnalyzerModel",
+    "TransformersEmbeddingBackend",
+    "TriageResult",
+    "get_default_engine",
 ]

analyzers/__init__.py ADDED Viewed

	@@ -0,0 +1,13 @@

+"""Domain-specific analyzers for multi-domain code understanding."""
+from .dsa_analyzer import analyze_dsa_code
+from .ds_analyzer import analyze_data_science_code
+from .ml_analyzer import analyze_ml_code
+from .web_analyzer import analyze_web_code
+__all__ = [
+    "analyze_dsa_code",
+    "analyze_data_science_code",
+    "analyze_ml_code",
+    "analyze_web_code",
+]

analyzers/ds_analyzer.py ADDED Viewed

	@@ -0,0 +1,56 @@

+"""Analyzer for data-science oriented Python code."""
+from __future__ import annotations
+from typing import Any, Dict
+from schemas.response import AnalysisIssue, DomainAnalysis
+def analyze_data_science_code(code: str, parsed: Dict[str, Any], complexity: Dict[str, Any]) -> DomainAnalysis:
+    """Inspect pandas and numpy code for vectorization and leakage concerns."""
+    issues = []
+    suggestions = []
+    score = 0.72
+    if "iterrows(" in code or "itertuples(" in code:
+        issues.append(
+            AnalysisIssue(
+                title="Row-wise dataframe iteration detected",
+                severity="medium",
+                description="Looping through dataframe rows is usually slower and less scalable than vectorized operations.",
+            )
+        )
+        suggestions.append("Use vectorized pandas or numpy expressions instead of row-wise iteration.")
+        score -= 0.18
+    if "inplace=True" in code:
+        suggestions.append("Avoid inplace mutation to keep data pipelines easier to reason about and test.")
+        score -= 0.05
+    if "fit_transform(" in code and "train_test_split" not in code:
+        issues.append(
+            AnalysisIssue(
+                title="Potential data leakage risk",
+                severity="high",
+                description="Feature transforms appear before an explicit train/test split.",
+            )
+        )
+        suggestions.append("Split train and validation data before fitting stateful preprocessing steps.")
+        score -= 0.2
+    if not suggestions:
+        suggestions.append("Add schema assumptions and null-handling checks for production data quality.")
+    return DomainAnalysis(
+        domain="data_science",
+        domain_score=max(0.05, round(score, 4)),
+        issues=issues,
+        suggestions=suggestions,
+        highlights={
+            "vectorization_risk": float("iterrows(" in code or "itertuples(" in code),
+            "time_complexity": complexity["time_complexity"],
+            "uses_pandas": float(parsed.get("uses_pandas", False)),
+        },
+    )

analyzers/dsa_analyzer.py ADDED Viewed

	@@ -0,0 +1,48 @@

+"""Analyzer for DSA and competitive-programming style Python code."""
+from __future__ import annotations
+from typing import Any, Dict
+from schemas.response import AnalysisIssue, DomainAnalysis
+def analyze_dsa_code(code: str, parsed: Dict[str, Any], complexity: Dict[str, Any]) -> DomainAnalysis:
+    """Inspect algorithmic code for brute-force patterns and efficiency risks."""
+    issues = []
+    suggestions = []
+    score = 0.7
+    if parsed.get("max_loop_depth", 0) >= 2:
+        issues.append(
+            AnalysisIssue(
+                title="Nested loops suggest brute-force behavior",
+                severity="medium",
+                description="The implementation scans the input multiple times, which is often avoidable in DSA problems.",
+            )
+        )
+        suggestions.append("Consider replacing nested scans with a hashmap, prefix table, or sorted search strategy.")
+        score -= 0.15
+    if parsed.get("uses_recursion"):
+        suggestions.append("Verify recursion depth and add memoization or iterative conversion if the input size can grow.")
+        score -= 0.05
+    if "sorted(" in code or ".sort(" in code:
+        suggestions.append("Sorting is acceptable here, but validate whether a direct O(n) pass can remove the sort.")
+    if not suggestions:
+        suggestions.append("Document the intended time complexity and add edge-case checks for empty input and duplicates.")
+    return DomainAnalysis(
+        domain="dsa",
+        domain_score=max(0.05, round(score, 4)),
+        issues=issues,
+        suggestions=suggestions,
+        highlights={
+            "time_complexity": complexity["time_complexity"],
+            "space_complexity": complexity["space_complexity"],
+            "max_loop_depth": float(parsed.get("max_loop_depth", 0)),
+        },
+    )

analyzers/ml_analyzer.py ADDED Viewed

	@@ -0,0 +1,61 @@

+"""Analyzer for machine-learning and deep-learning code."""
+from __future__ import annotations
+from typing import Any, Dict
+from schemas.response import AnalysisIssue, DomainAnalysis
+def analyze_ml_code(code: str, parsed: Dict[str, Any], complexity: Dict[str, Any]) -> DomainAnalysis:
+    """Inspect training and inference logic for common ML / DL mistakes."""
+    issues = []
+    suggestions = []
+    score = 0.74
+    if "torch" in code and "model.eval()" not in code and "predict" in code.lower():
+        issues.append(
+            AnalysisIssue(
+                title="Inference path may be missing eval mode",
+                severity="high",
+                description="Inference code should place the model in eval mode before prediction.",
+            )
+        )
+        suggestions.append("Call model.eval() before inference to disable training-time behavior such as dropout.")
+        score -= 0.18
+    if "torch" in code and "no_grad" not in code and "predict" in code.lower():
+        suggestions.append("Wrap inference in torch.no_grad() to reduce memory usage and avoid unnecessary gradient tracking.")
+        score -= 0.12
+    if parsed.get("calls_backward") and not parsed.get("calls_optimizer_step"):
+        issues.append(
+            AnalysisIssue(
+                title="Backward pass without optimizer step",
+                severity="medium",
+                description="Gradients are computed, but the optimizer step is not obvious in the snippet.",
+            )
+        )
+        suggestions.append("Ensure optimizer.step() and optimizer.zero_grad() are placed correctly in the training loop.")
+        score -= 0.12
+    if "CrossEntropyLoss" in code and "softmax(" in code:
+        suggestions.append("CrossEntropyLoss expects raw logits; remove the explicit softmax before the loss when possible.")
+        score -= 0.05
+    if not suggestions:
+        suggestions.append("Add explicit train/eval mode transitions and log validation metrics during training.")
+    return DomainAnalysis(
+        domain="ml_dl",
+        domain_score=max(0.05, round(score, 4)),
+        issues=issues,
+        suggestions=suggestions,
+        highlights={
+            "uses_torch": float(parsed.get("uses_torch", False)),
+            "has_eval_mode": float("model.eval()" in code),
+            "has_no_grad": float("no_grad" in code),
+            "time_complexity": complexity["time_complexity"],
+        },
+    )

analyzers/web_analyzer.py ADDED Viewed

	@@ -0,0 +1,50 @@

+"""Analyzer for FastAPI and backend web-service code."""
+from __future__ import annotations
+from typing import Any, Dict
+from schemas.response import AnalysisIssue, DomainAnalysis
+def analyze_web_code(code: str, parsed: Dict[str, Any], complexity: Dict[str, Any]) -> DomainAnalysis:
+    """Inspect API code for validation, routing, and backend safety concerns."""
+    issues = []
+    suggestions = []
+    score = 0.76
+    route_decorators = set(parsed.get("route_decorators", []))
+    if route_decorators and not parsed.get("uses_pydantic"):
+        issues.append(
+            AnalysisIssue(
+                title="Request validation model is missing",
+                severity="high",
+                description="Route handlers appear present, but no obvious Pydantic validation layer was detected.",
+            )
+        )
+        suggestions.append("Add Pydantic request and response models for strict validation and type-safe contracts.")
+        score -= 0.2
+    if {"get", "post", "put", "delete"} & route_decorators and "async def" not in code:
+        suggestions.append("Prefer async FastAPI endpoints when the route performs I/O or awaits downstream services.")
+        score -= 0.08
+    if "request.json()" in code or "request.body()" in code:
+        suggestions.append("Validate raw request payloads before use; avoid trusting unchecked JSON input.")
+        score -= 0.08
+    if not suggestions:
+        suggestions.append("Add domain-specific response models and centralize dependency injection for cleaner API structure.")
+    return DomainAnalysis(
+        domain="web",
+        domain_score=max(0.05, round(score, 4)),
+        issues=issues,
+        suggestions=suggestions,
+        highlights={
+            "route_count": float(len(route_decorators)),
+            "uses_validation": float(parsed.get("uses_pydantic", False)),
+            "time_complexity": complexity["time_complexity"],
+        },
+    )

api/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+"""FastAPI backend package for the multi-domain analyzer."""
+from .main import app
+__all__ = ["app"]

api/main.py ADDED Viewed

	@@ -0,0 +1,27 @@

+"""FastAPI backend for the multi-domain AI code analyzer."""
+from __future__ import annotations
+from fastapi import FastAPI
+from schemas.request import AnalyzeCodeRequest
+from schemas.response import AnalyzeCodeResponse
+from services.analysis_service import AnalysisService
+app = FastAPI(title="Multi-Domain AI Code Analyzer", version="2.0.0")
+analysis_service = AnalysisService()
+@app.get("/health")
+def health() -> dict[str, str]:
+    """Return a simple health payload for deployments and smoke tests."""
+    return {"status": "ok"}
+@app.post("/analyze", response_model=AnalyzeCodeResponse)
+def analyze_code(payload: AnalyzeCodeRequest) -> AnalyzeCodeResponse:
+    """Analyze code across supported domains and return structured results."""
+    return analysis_service.analyze(payload)

app/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ """Streamlit UI package for the multi-domain analyzer."""

app/examples.py ADDED Viewed

	@@ -0,0 +1,31 @@

+"""Example snippets for each supported analysis domain."""
+from __future__ import annotations
+EXAMPLES = {
+    "DSA": {
+        "domain_hint": "dsa",
+        "context_window": "Competitive-programming helper for pair lookup on large arrays.",
+        "traceback_text": "",
+        "code": """def two_sum(nums, target):\n    for i in range(len(nums)):\n        for j in range(i + 1, len(nums)):\n            if nums[i] + nums[j] == target:\n                return [i, j]\n    return []\n""",
+    },
+    "Data Science": {
+        "domain_hint": "data_science",
+        "context_window": "Feature engineering step in a churn-prediction notebook.",
+        "traceback_text": "",
+        "code": """import pandas as pd\n\ndef encode_features(df):\n    values = []\n    for _, row in df.iterrows():\n        values.append(row['age'] * row['sessions'])\n    df['score'] = values\n    return df\n""",
+    },
+    "ML / DL": {
+        "domain_hint": "ml_dl",
+        "context_window": "Inference utility for a PyTorch classifier used in a batch review job.",
+        "traceback_text": "",
+        "code": """import torch\n\nclass Predictor:\n    def __init__(self, model):\n        self.model = model\n\n    def predict(self, batch):\n        outputs = self.model(batch)\n        return outputs.argmax(dim=1)\n""",
+    },
+    "Web / FastAPI": {
+        "domain_hint": "web",
+        "context_window": "Backend endpoint for creating review tasks from user-submitted payloads.",
+        "traceback_text": "",
+        "code": """from fastapi import FastAPI, Request\n\napp = FastAPI()\n\n@app.post('/tasks')\ndef create_task(request: Request):\n    payload = request.json()\n    return {'task': payload}\n""",
+    },
+}

app/streamlit_app.py ADDED Viewed

	@@ -0,0 +1,100 @@

+"""Streamlit frontend for the multi-domain analyzer platform."""
+from __future__ import annotations
+import streamlit as st
+from app.examples import EXAMPLES
+from schemas.request import AnalyzeCodeRequest
+from services.analysis_service import AnalysisService
+analysis_service = AnalysisService()
+def _analyze(code: str, context_window: str, traceback_text: str, domain_hint: str):
+    """Run the analysis service with validated request payloads."""
+    request = AnalyzeCodeRequest(
+        code=code,
+        context_window=context_window,
+        traceback_text=traceback_text,
+        domain_hint=domain_hint,  # type: ignore[arg-type]
+    )
+    return analysis_service.analyze(request)
+def main() -> None:
+    """Render the Streamlit UI."""
+    st.set_page_config(page_title="Multi-Domain AI Code Analyzer", layout="wide")
+    st.title("Multi-Domain AI Code Analyzer & Improvement System")
+    st.caption("PyTorch-powered code review across DSA, Data Science, ML/DL, and Web backend code.")
+    example_name = st.selectbox("Example input", list(EXAMPLES.keys()))
+    example = EXAMPLES[example_name]
+    auto_analyze = st.toggle("Real-time scoring", value=True)
+    left, right = st.columns([1.2, 1.0])
+    with left:
+        code = st.text_area("Code input", value=example["code"], height=420)
+        context_window = st.text_area("Context window", value=example["context_window"], height=100)
+        traceback_text = st.text_area("Optional traceback / runtime hint", value=example["traceback_text"], height=100)
+        domain_hint = st.selectbox("Domain hint", ["auto", "dsa", "data_science", "ml_dl", "web"], index=["auto", "dsa", "data_science", "ml_dl", "web"].index(example["domain_hint"]))
+        analyze_clicked = st.button("Analyze Code", type="primary")
+    result = None
+    if code and (analyze_clicked or auto_analyze):
+        result = _analyze(code, context_window, traceback_text, domain_hint)
+    with right:
+        if result is None:
+            st.info("Paste code or load an example to start analysis.")
+        else:
+            metric_cols = st.columns(4)
+            metric_cols[0].metric("Detected domain", result.detected_domain)
+            metric_cols[1].metric("ML score", f"{result.score_breakdown.ml_score:.0%}")
+            metric_cols[2].metric("Domain score", f"{result.score_breakdown.domain_score:.0%}")
+            metric_cols[3].metric("Reward", f"{result.score_breakdown.reward:.0%}")
+            st.bar_chart(result.domain_confidences)
+            st.caption(result.summary)
+    if result is not None:
+        overview_tab, suggestions_tab, domain_tab, static_tab = st.tabs(
+            ["Overview", "Suggestions", "Domain Detail", "Static Analysis"]
+        )
+        with overview_tab:
+            st.subheader("Improvement Plan")
+            for step in result.improvement_plan:
+                st.write(f"- {step}")
+            st.subheader("Complexity")
+            st.write(
+                {
+                    "time_complexity": result.static_analysis.time_complexity,
+                    "space_complexity": result.static_analysis.space_complexity,
+                    "cyclomatic_complexity": result.static_analysis.cyclomatic_complexity,
+                }
+            )
+        with suggestions_tab:
+            st.subheader("Suggestions")
+            for suggestion in result.domain_analysis.suggestions:
+                st.write(f"- {suggestion}")
+            if result.domain_analysis.issues:
+                st.subheader("Issues")
+                for issue in result.domain_analysis.issues:
+                    st.write(f"- [{issue.severity}] {issue.title}: {issue.description}")
+        with domain_tab:
+            st.subheader("Domain Highlights")
+            st.json(result.domain_analysis.highlights)
+            st.write(f"Domain score: {result.domain_analysis.domain_score:.0%}")
+        with static_tab:
+            st.subheader("Static Analysis")
+            st.json(result.static_analysis.model_dump())
+if __name__ == "__main__":
+    main()

client.py CHANGED Viewed

@@ -7,7 +7,7 @@ from typing import Dict
 from openenv.core import EnvClient
 from openenv.core.client_types import StepResult
-from .models import (
     PythonCodeReviewAction,
     PythonCodeReviewObservation,
     PythonCodeReviewState,

 from openenv.core import EnvClient
 from openenv.core.client_types import StepResult
+from .Models import (
     PythonCodeReviewAction,
     PythonCodeReviewObservation,
     PythonCodeReviewState,

graders/bug_fix.py CHANGED Viewed

@@ -3,10 +3,10 @@
 from __future__ import annotations
 try:
-    from ..models import TaskGrade
     from ..tasks.catalog import ReviewTask
 except ImportError:
-    from models import TaskGrade
     from tasks.catalog import ReviewTask
 from .shared import (

 from __future__ import annotations
 try:
+    from ..Models import TaskGrade
     from ..tasks.catalog import ReviewTask
 except ImportError:
+    from Models import TaskGrade
     from tasks.catalog import ReviewTask
 from .shared import (

graders/dispatch.py CHANGED Viewed

@@ -3,10 +3,10 @@
 from __future__ import annotations
 try:
-    from ..models import TaskGrade
     from ..tasks.catalog import ReviewTask
 except ImportError:
-    from models import TaskGrade
     from tasks.catalog import ReviewTask
 from .bug_fix import grade_bug_fix_task

 from __future__ import annotations
 try:
+    from ..Models import TaskGrade
     from ..tasks.catalog import ReviewTask
 except ImportError:
+    from Models import TaskGrade
     from tasks.catalog import ReviewTask
 from .bug_fix import grade_bug_fix_task

graders/optimization.py CHANGED Viewed

@@ -3,10 +3,10 @@
 from __future__ import annotations
 try:
-    from ..models import TaskGrade
     from ..tasks.catalog import ReviewTask
 except ImportError:
-    from models import TaskGrade
     from tasks.catalog import ReviewTask
 from .shared import (

 from __future__ import annotations
 try:
+    from ..Models import TaskGrade
     from ..tasks.catalog import ReviewTask
 except ImportError:
+    from Models import TaskGrade
     from tasks.catalog import ReviewTask
 from .shared import (

graders/shared.py CHANGED Viewed

@@ -11,10 +11,10 @@ import traceback
 from typing import Any, Callable, Dict, List
 try:
-    from ..models import TaskGrade
     from ..tasks.catalog import CallCase, ReviewTask
 except ImportError:
-    from models import TaskGrade
     from tasks.catalog import CallCase, ReviewTask

 from typing import Any, Callable, Dict, List
 try:
+    from ..Models import TaskGrade
     from ..tasks.catalog import CallCase, ReviewTask
 except ImportError:
+    from Models import TaskGrade
     from tasks.catalog import CallCase, ReviewTask

graders/syntax.py CHANGED Viewed

@@ -3,10 +3,10 @@
 from __future__ import annotations
 try:
-    from ..models import TaskGrade
     from ..tasks.catalog import ReviewTask
 except ImportError:
-    from models import TaskGrade
     from tasks.catalog import ReviewTask
 from .shared import (

 from __future__ import annotations
 try:
+    from ..Models import TaskGrade
     from ..tasks.catalog import ReviewTask
 except ImportError:
+    from Models import TaskGrade
     from tasks.catalog import ReviewTask
 from .shared import (

inference.py CHANGED Viewed

@@ -28,7 +28,7 @@ except Exception:
     PythonCodeReviewEnvironment = None  # type: ignore[assignment]
 try:
-    from models import PythonCodeReviewAction
 except Exception:
     PythonCodeReviewAction = None  # type: ignore[assignment]

     PythonCodeReviewEnvironment = None  # type: ignore[assignment]
 try:
+    from Models import PythonCodeReviewAction
 except Exception:
     PythonCodeReviewAction = None  # type: ignore[assignment]

launch.py ADDED Viewed

	@@ -0,0 +1,35 @@

+"""Launch the FastAPI backend and Streamlit UI in one Docker container."""
+from __future__ import annotations
+import subprocess
+import sys
+def main() -> int:
+    """Start the API backend in the background and keep Streamlit in the foreground."""
+    api_process = subprocess.Popen(
+        ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8001"],
+    )
+    try:
+        return subprocess.call(
+            [
+                "streamlit",
+                "run",
+                "app/streamlit_app.py",
+                "--server.port",
+                "8000",
+                "--server.address",
+                "0.0.0.0",
+                "--server.headless",
+                "true",
+            ]
+        )
+    finally:
+        api_process.terminate()
+        api_process.wait(timeout=10)
+if __name__ == "__main__":
+    sys.exit(main())

models/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+"""PyTorch-backed model wrappers for the analyzer platform."""
+from .pytorch_model import PyTorchCodeAnalyzerModel
+__all__ = ["PyTorchCodeAnalyzerModel"]

models/pytorch_model.py ADDED Viewed

	@@ -0,0 +1,149 @@

+"""PyTorch + transformers model wrapper for multi-domain code scoring."""
+from __future__ import annotations
+import hashlib
+from typing import Dict, List, Sequence
+import torch
+import torch.nn.functional as F
+try:
+    from transformers import AutoModel, AutoTokenizer
+except Exception:
+    AutoModel = None  # type: ignore[assignment]
+    AutoTokenizer = None  # type: ignore[assignment]
+DOMAIN_PROTOTYPES: Dict[str, List[str]] = {
+    "dsa": [
+        "Binary search, hashmap optimization, recursion, dynamic programming, arrays, trees, graphs, stack, queue, complexity.",
+        "Competitive programming algorithm with loops, memoization, prefix sums, and asymptotic analysis.",
+    ],
+    "data_science": [
+        "Pandas dataframe transformation, numpy vectorization, feature leakage, train test split, iterrows misuse.",
+        "Data cleaning pipeline using pandas, numpy, aggregation, joins, and vectorized operations.",
+    ],
+    "ml_dl": [
+        "PyTorch model, training loop, optimizer, backward pass, eval mode, no_grad, loss function, dataloader.",
+        "Machine learning inference and training code with torch, sklearn, tensors, gradients, and model checkpoints.",
+    ],
+    "web": [
+        "FastAPI endpoint, request validation, Pydantic models, async routes, API security, backend service design.",
+        "REST API backend with routers, dependency injection, input validation, serialization, and error handling.",
+    ],
+    "general": [
+        "General Python utility code with readable structure, typing, tests, and maintainable abstractions.",
+    ],
+}
+QUALITY_ANCHORS: Dict[str, List[str]] = {
+    "high": [
+        "Readable typed Python code with validation, efficient algorithms, vectorized operations, safe inference, and clean API boundaries.",
+        "Production-ready code with small functions, docstrings, low complexity, and clear error handling.",
+    ],
+    "low": [
+        "Brute-force nested loops, missing validation, unsafe input handling, missing eval mode, missing no_grad, and code smells.",
+        "Hard to maintain code with high complexity, repeated scans, mutable side effects, and unclear structure.",
+    ],
+}
+class _HashEmbeddingBackend:
+    """Torch-native fallback when pretrained weights cannot be loaded."""
+    def __init__(self, dimensions: int = 128) -> None:
+        self.dimensions = dimensions
+        self.model_id = "hashed-token-fallback"
+        self.backend_name = "hashed-token-fallback"
+        self.notes = ["Using hashed embeddings because pretrained transformer weights are unavailable."]
+    def embed_texts(self, texts: Sequence[str]) -> torch.Tensor:
+        matrix = torch.zeros((len(texts), self.dimensions), dtype=torch.float32)
+        for row_index, text in enumerate(texts):
+            tokens = text.lower().split()[:512]
+            if not tokens:
+                matrix[row_index, 0] = 1.0
+                continue
+            for token in tokens:
+                digest = hashlib.md5(token.encode("utf-8")).hexdigest()
+                bucket = int(digest[:8], 16) % self.dimensions
+                sign = -1.0 if int(digest[8:10], 16) % 2 else 1.0
+                matrix[row_index, bucket] += sign
+        return F.normalize(matrix + 1e-6, dim=1)
+class PyTorchCodeAnalyzerModel:
+    """Score code using pretrained transformer embeddings plus prototype similarity."""
+    def __init__(self, model_id: str = "huggingface/CodeBERTa-small-v1") -> None:
+        self.model_id = model_id
+        self.backend_name = model_id
+        self.notes: List[str] = []
+        self._tokenizer = None
+        self._model = None
+        self._fallback = _HashEmbeddingBackend()
+        self._prototype_cache: Dict[str, torch.Tensor] = {}
+    def _ensure_loaded(self) -> None:
+        if self._model is not None or self.notes:
+            return
+        if AutoTokenizer is None or AutoModel is None:
+            self.backend_name = self._fallback.backend_name
+            self.notes = list(self._fallback.notes)
+            return
+        try:
+            self._tokenizer = AutoTokenizer.from_pretrained(self.model_id)
+            self._model = AutoModel.from_pretrained(self.model_id)
+            self._model.eval()
+            self.notes.append(f"Loaded pretrained encoder `{self.model_id}`.")
+        except Exception as exc:
+            self.backend_name = self._fallback.backend_name
+            self.notes = list(self._fallback.notes) + [f"Pretrained load failed: {type(exc).__name__}: {exc}"]
+    def _embed_texts(self, texts: Sequence[str]) -> torch.Tensor:
+        self._ensure_loaded()
+        if self._model is None or self._tokenizer is None:
+            return self._fallback.embed_texts(texts)
+        encoded = self._tokenizer(list(texts), padding=True, truncation=True, max_length=256, return_tensors="pt")
+        with torch.no_grad():
+            outputs = self._model(**encoded)
+            hidden = outputs.last_hidden_state
+            mask = encoded["attention_mask"].unsqueeze(-1)
+            pooled = (hidden * mask).sum(dim=1) / mask.sum(dim=1).clamp(min=1)
+        return F.normalize(pooled, dim=1)
+    def _prototype_matrix(self, bucket: str, texts: Sequence[str]) -> torch.Tensor:
+        if bucket not in self._prototype_cache:
+            self._prototype_cache[bucket] = self._embed_texts(texts)
+        return self._prototype_cache[bucket]
+    def predict(self, code: str, context_window: str, static_summary: Dict[str, object]) -> Dict[str, object]:
+        """Predict domain probabilities and a model quality score."""
+        document = (
+            f"Code:\n{code.strip()[:4000]}\n\n"
+            f"Context:\n{context_window.strip()[:1000]}\n\n"
+            f"Static hints:\n{static_summary}\n"
+        )
+        candidate = self._embed_texts([document])
+        domain_scores: Dict[str, float] = {}
+        for domain, texts in DOMAIN_PROTOTYPES.items():
+            matrix = self._prototype_matrix(f"domain:{domain}", texts)
+            similarity = torch.matmul(candidate, matrix.T).max().item()
+            domain_scores[domain] = round((similarity + 1.0) / 2.0, 4)
+        high_matrix = self._prototype_matrix("quality:high", QUALITY_ANCHORS["high"])
+        low_matrix = self._prototype_matrix("quality:low", QUALITY_ANCHORS["low"])
+        high_similarity = torch.matmul(candidate, high_matrix.T).max().item()
+        low_similarity = torch.matmul(candidate, low_matrix.T).max().item()
+        ml_quality_score = torch.sigmoid(torch.tensor((high_similarity - low_similarity) * 4.0)).item()
+        return {
+            "domain_scores": domain_scores,
+            "ml_quality_score": round(float(ml_quality_score), 4),
+            "backend_name": self.backend_name,
+            "model_id": self.model_id,
+            "notes": list(self.notes),
+        }

pyproject.toml CHANGED Viewed

@@ -5,14 +5,18 @@ build-backend = "setuptools.build_meta"
 [project]
 name = "openenv-python-code-review-env"
 version = "1.0.0"
-description = "Production-grade OpenEnv environment for Python code review workflows."
 readme = "README.md"
 requires-python = ">=3.10"
 dependencies = [
     "fastapi>=0.111.0",
     "openai>=1.76.0",
     "openenv-core[core]>=0.2.2",
     "pytest>=8.0.0",
     "uvicorn>=0.30.0",
 ]
@@ -31,5 +35,12 @@ packages = [
     "python_env.server",
     "python_env.tasks",
     "python_env.graders",
 ]
-package-dir = { "python_env" = ".", "python_env.server" = "server", "python_env.tasks" = "tasks", "python_env.graders" = "graders" }

 [project]
 name = "openenv-python-code-review-env"
 version = "1.0.0"
+description = "TorchReview Copilot: AI-powered Python code triage with PyTorch and OpenEnv validation."
 readme = "README.md"
 requires-python = ">=3.10"
 dependencies = [
     "fastapi>=0.111.0",
+    "gradio>=5.26.0",
     "openai>=1.76.0",
     "openenv-core[core]>=0.2.2",
     "pytest>=8.0.0",
+    "streamlit>=1.44.0",
+    "torch>=2.2.0",
+    "transformers>=4.45.0",
     "uvicorn>=0.30.0",
 ]
     "python_env.server",
     "python_env.tasks",
     "python_env.graders",
+    "python_env.api",
+    "python_env.app",
+    "python_env.analyzers",
+    "python_env.models",
+    "python_env.schemas",
+    "python_env.services",
+    "python_env.utils",
 ]
+package-dir = { "python_env" = ".", "python_env.server" = "server", "python_env.tasks" = "tasks", "python_env.graders" = "graders", "python_env.api" = "api", "python_env.app" = "app", "python_env.analyzers" = "analyzers", "python_env.models" = "models", "python_env.schemas" = "schemas", "python_env.services" = "services", "python_env.utils" = "utils" }

schemas/__init__.py ADDED Viewed

	@@ -0,0 +1,13 @@

+"""Public schemas for the multi-domain analysis platform."""
+from .request import AnalyzeCodeRequest
+from .response import AnalyzeCodeResponse, AnalysisIssue, DomainAnalysis, ScoreBreakdown, StaticAnalysisSummary
+__all__ = [
+    "AnalyzeCodeRequest",
+    "AnalyzeCodeResponse",
+    "AnalysisIssue",
+    "DomainAnalysis",
+    "ScoreBreakdown",
+    "StaticAnalysisSummary",
+]

schemas/request.py ADDED Viewed

	@@ -0,0 +1,19 @@

+"""Request schemas for code analysis endpoints and UI."""
+from __future__ import annotations
+from typing import Literal
+from pydantic import BaseModel, Field
+DomainHint = Literal["auto", "dsa", "data_science", "ml_dl", "web"]
+class AnalyzeCodeRequest(BaseModel):
+    """Validated input payload for multi-domain code analysis."""
+    code: str = Field(..., min_length=1, description="Source code to analyze.")
+    context_window: str = Field(default="", max_length=2000, description="Optional repository or task context.")
+    traceback_text: str = Field(default="", max_length=2000, description="Optional runtime or test failure output.")
+    domain_hint: DomainHint = Field(default="auto", description="Optional domain override when auto detection is not desired.")

schemas/response.py ADDED Viewed

	@@ -0,0 +1,70 @@

+"""Response schemas for the multi-domain analysis platform."""
+from __future__ import annotations
+from typing import Dict, List, Literal
+from pydantic import BaseModel, Field
+DomainType = Literal["dsa", "data_science", "ml_dl", "web", "general"]
+Severity = Literal["low", "medium", "high"]
+class AnalysisIssue(BaseModel):
+    """One detected issue or risk in the code snippet."""
+    title: str
+    severity: Severity
+    description: str
+    line_hint: int | None = None
+class StaticAnalysisSummary(BaseModel):
+    """Language-agnostic static-analysis signals."""
+    syntax_valid: bool
+    syntax_error: str = ""
+    cyclomatic_complexity: int = Field(..., ge=1)
+    line_count: int = Field(..., ge=0)
+    max_loop_depth: int = Field(..., ge=0)
+    time_complexity: str = "Unknown"
+    space_complexity: str = "Unknown"
+    detected_imports: List[str] = Field(default_factory=list)
+    code_smells: List[str] = Field(default_factory=list)
+class DomainAnalysis(BaseModel):
+    """Domain-specific analysis payload returned by an analyzer."""
+    domain: DomainType
+    domain_score: float = Field(..., ge=0.0, le=1.0)
+    issues: List[AnalysisIssue] = Field(default_factory=list)
+    suggestions: List[str] = Field(default_factory=list)
+    highlights: Dict[str, float | str] = Field(default_factory=dict)
+class ScoreBreakdown(BaseModel):
+    """Reward inputs and final normalized score."""
+    ml_score: float = Field(..., ge=0.0, le=1.0)
+    domain_score: float = Field(..., ge=0.0, le=1.0)
+    lint_score: float = Field(..., ge=0.0, le=1.0)
+    complexity_penalty: float = Field(..., ge=0.0, le=1.0)
+    reward: float = Field(..., ge=0.0, le=1.0)
+class AnalyzeCodeResponse(BaseModel):
+    """Top-level structured output for API and UI consumers."""
+    detected_domain: DomainType
+    domain_confidences: Dict[str, float]
+    score_breakdown: ScoreBreakdown
+    static_analysis: StaticAnalysisSummary
+    domain_analysis: DomainAnalysis
+    improvement_plan: List[str] = Field(default_factory=list)
+    model_backend: str
+    model_id: str
+    summary: str
+    context_window: str = ""
+    analysis_time_ms: float = Field(..., ge=0.0)

server/app.py CHANGED Viewed

@@ -1,4 +1,4 @@
-"""FastAPI entrypoint for python_code_review_env."""
 from __future__ import annotations
@@ -10,20 +10,36 @@ except Exception as exc:  # pragma: no cover
     ) from exc
 try:
-    from ..models import PythonCodeReviewAction, PythonCodeReviewObservation
     from .env import PythonCodeReviewEnvironment
 except ImportError:
-    from models import PythonCodeReviewAction, PythonCodeReviewObservation
     from server.env import PythonCodeReviewEnvironment
-app = create_app(
-    PythonCodeReviewEnvironment,
-    PythonCodeReviewAction,
-    PythonCodeReviewObservation,
-    env_name="python_code_review_env",
-    max_concurrent_envs=4,
-)
 def main(host: str = "0.0.0.0", port: int = 8000) -> None:

+"""FastAPI + Gradio entrypoint for TorchReview Copilot."""
 from __future__ import annotations
     ) from exc
 try:
+    import gradio as gr
+except Exception:
+    gr = None  # type: ignore[assignment]
+try:
+    from ..Models import PythonCodeReviewAction, PythonCodeReviewObservation
     from .env import PythonCodeReviewEnvironment
+    from .demo import build_demo
 except ImportError:
+    from Models import PythonCodeReviewAction, PythonCodeReviewObservation
     from server.env import PythonCodeReviewEnvironment
+    from server.demo import build_demo
+def build_application():
+    """Compose the OpenEnv API with the Gradio demo frontend."""
+    api_app = create_app(
+        PythonCodeReviewEnvironment,
+        PythonCodeReviewAction,
+        PythonCodeReviewObservation,
+        env_name="python_code_review_env",
+        max_concurrent_envs=4,
+    )
+    if gr is None:
+        return api_app
+    return gr.mount_gradio_app(api_app, build_demo(), path="/")
+app = build_application()
 def main(host: str = "0.0.0.0", port: int = 8000) -> None:

server/demo.py ADDED Viewed

	@@ -0,0 +1,441 @@

+"""Gradio UI for TorchReview Copilot."""
+from __future__ import annotations
+from html import escape
+import gradio as gr
+try:
+    from ..triage import get_default_engine
+except ImportError:
+    from triage import get_default_engine
+CSS = """
+:root {
+  --paper: #f6f1e8;
+  --ink: #162521;
+  --accent: #d95d39;
+  --panel: #fffdf8;
+  --border: #d6c4b8;
+  --muted: #5f6f67;
+  --good: #2d7d62;
+  --warn: #b76516;
+  --high: #b23a48;
+}
+body, .gradio-container {
+  background:
+    radial-gradient(circle at top left, rgba(247, 197, 159, 0.35), transparent 35%),
+    linear-gradient(135deg, #f9f6ef 0%, #efe5d3 100%);
+  color: var(--ink);
+  font-family: Georgia, "Times New Roman", serif;
+}
+.gradio-container {
+  max-width: 1260px !important;
+}
+.hero-card,
+.metric-card,
+.subtle-card {
+  background: rgba(255, 253, 248, 0.95);
+  border: 1px solid var(--border);
+  border-radius: 20px;
+  box-shadow: 0 16px 40px rgba(22, 37, 33, 0.08);
+}
+.hero-card {
+  padding: 28px 30px;
+  margin-bottom: 12px;
+}
+.metric-card,
+.subtle-card {
+  padding: 20px 22px;
+}
+.eyebrow {
+  text-transform: uppercase;
+  letter-spacing: 0.12em;
+  font-size: 12px;
+  color: var(--accent);
+  margin-bottom: 10px;
+}
+.hero-title {
+  font-size: 44px;
+  line-height: 1.05;
+  margin: 0 0 10px;
+}
+.hero-copy {
+  margin: 0;
+  font-size: 18px;
+  line-height: 1.55;
+  color: var(--muted);
+}
+.summary-title {
+  display: flex;
+  justify-content: space-between;
+  gap: 12px;
+  align-items: center;
+  margin-bottom: 14px;
+}
+.pill {
+  display: inline-block;
+  padding: 6px 12px;
+  border-radius: 999px;
+  font-size: 12px;
+  text-transform: uppercase;
+  letter-spacing: 0.08em;
+  background: #efe5d3;
+}
+.pill.low { color: var(--good); }
+.pill.medium { color: var(--warn); }
+.pill.high { color: var(--high); }
+.summary-grid {
+  display: grid;
+  grid-template-columns: repeat(2, minmax(0, 1fr));
+  gap: 12px;
+  margin-top: 16px;
+}
+.summary-stat {
+  background: #fff7ef;
+  border-radius: 14px;
+  padding: 12px 14px;
+  border: 1px solid rgba(214, 196, 184, 0.8);
+}
+.summary-stat strong {
+  display: block;
+  font-size: 12px;
+  text-transform: uppercase;
+  letter-spacing: 0.08em;
+  color: var(--muted);
+  margin-bottom: 6px;
+}
+.radar-wrap {
+  display: grid;
+  gap: 12px;
+}
+.bar {
+  display: grid;
+  gap: 6px;
+}
+.bar-head {
+  display: flex;
+  justify-content: space-between;
+  font-size: 13px;
+  color: var(--muted);
+}
+.bar-track {
+  width: 100%;
+  height: 12px;
+  background: #f2e5d6;
+  border-radius: 999px;
+  overflow: hidden;
+}
+.bar-fill {
+  height: 100%;
+  border-radius: 999px;
+}
+.matched-box {
+  background: #fff7ef;
+  border: 1px solid rgba(214, 196, 184, 0.8);
+  border-radius: 16px;
+  padding: 14px;
+}
+.how-grid {
+  display: grid;
+  grid-template-columns: repeat(4, minmax(0, 1fr));
+  gap: 12px;
+}
+.how-step {
+  background: rgba(255, 253, 248, 0.9);
+  border: 1px solid var(--border);
+  border-radius: 18px;
+  padding: 16px;
+}
+@media (max-width: 900px) {
+  .hero-title {
+    font-size: 34px;
+  }
+  .summary-grid,
+  .how-grid {
+    grid-template-columns: 1fr;
+  }
+}
+"""
+def _default_outputs() -> tuple[str, str, str, str, str]:
+    return (
+        "<div class='metric-card'><div class='eyebrow'>Awaiting Analysis</div><p class='hero-copy'>Paste Python code, add an optional traceback, or load one of the built-in examples.</p></div>",
+        "<div class='metric-card'><div class='eyebrow'>Live Triage Radar</div><p class='hero-copy'>Confidence bars will appear after the first analysis run.</p></div>",
+        "### Improvement Plan\nAnalyze a sample to generate syntax, edge-case, and scalability recommendations.",
+        "### Known Pattern Match\nThe nearest OpenEnv task will be highlighted here after inference runs.",
+        "### Model Notes\nBackend and extracted signal details will appear here.",
+    )
+def _summary_html(result) -> str:
+    issue = escape(result.issue_label.title())
+    summary = escape(result.summary)
+    next_action = escape(result.suggested_next_action)
+    return f"""
+    <div class="metric-card">
+      <div class="summary-title">
+        <div>
+          <div class="eyebrow">TorchReview Verdict</div>
+          <h3 style="margin:0;font-size:30px;">{issue} Issue</h3>
+        </div>
+        <span class="pill {escape(result.repair_risk)}">{escape(result.repair_risk)} repair risk</span>
+      </div>
+      <p class="hero-copy">{summary}</p>
+        <div class="summary-grid">
+        <div class="summary-stat">
+          <strong>Reward Score</strong>
+          {result.reward_score:.0%}
+        </div>
+        <div class="summary-stat">
+          <strong>ML Quality</strong>
+          {result.ml_quality_score:.0%}
+        </div>
+        <div class="summary-stat">
+          <strong>Matched Pattern</strong>
+          {escape(result.matched_pattern.title)}
+        </div>
+        <div class="summary-stat">
+          <strong>Inference Backend</strong>
+          {escape(result.model_backend)}
+        </div>
+        <div class="summary-stat">
+          <strong>Lint Score</strong>
+          {result.lint_score:.0%}
+        </div>
+        <div class="summary-stat">
+          <strong>Complexity Penalty</strong>
+          {result.complexity_penalty:.0%}
+        </div>
+        <div class="summary-stat">
+          <strong>Next Action</strong>
+          {next_action}
+        </div>
+      </div>
+    </div>
+    """
+def _radar_html(result) -> str:
+    colors = {
+        "syntax": "#d95d39",
+        "logic": "#4f772d",
+        "performance": "#355070",
+    }
+    bars = []
+    for label, score in result.confidence_scores.items():
+        bars.append(
+            f"""
+            <div class="bar">
+              <div class="bar-head"><span>{escape(label.title())}</span><span>{score:.0%}</span></div>
+              <div class="bar-track">
+                <div class="bar-fill" style="width:{score * 100:.1f}%; background:{colors.get(label, '#d95d39')};"></div>
+              </div>
+            </div>
+            """
+        )
+    return f"""
+    <div class="metric-card radar-wrap">
+      <div class="eyebrow">Live Triage Radar</div>
+      {''.join(bars)}
+      <div class="matched-box">
+        <strong>Nearest Known Pattern:</strong> {escape(result.matched_pattern.title)}<br>
+        <span style="color:#5f6f67;">{escape(result.matched_pattern.summary)}</span>
+      </div>
+    </div>
+    """
+def _plan_markdown(result) -> str:
+    plan_lines = "\n".join(f"{index + 1}. {step}" for index, step in enumerate(result.repair_plan))
+    return (
+        "### Improvement Plan\n"
+        f"**Primary issue:** `{result.issue_label}`\n\n"
+        f"{plan_lines}\n\n"
+        f"**Suggested next action:** {result.suggested_next_action}"
+    )
+def _match_markdown(result) -> str:
+    return (
+        "### Known Pattern Match\n"
+        f"**Task:** `{result.matched_pattern.task_id}`  \n"
+        f"**Title:** {result.matched_pattern.title}  \n"
+        f"**Why it matched:** {result.matched_pattern.rationale}  \n"
+        f"**Similarity:** {result.matched_pattern.similarity:.0%}"
+    )
+def _model_markdown(result) -> str:
+    signal_lines = "\n".join(
+        f"- `{signal.name}` -> {signal.value} ({signal.impact}, weight {signal.weight:.2f}): {signal.evidence}"
+        for signal in result.extracted_signals
+    ) or "- No strong static signals were extracted."
+    notes = "\n".join(f"- {item}" for item in result.inference_notes) or "- No additional backend notes."
+    return (
+        "### Model Notes\n"
+        f"- **Model backend:** `{result.model_backend}`\n"
+        f"- **Model id:** `{result.model_id}`\n"
+        f"- **Analysis time:** `{result.analysis_time_ms:.2f} ms`\n\n"
+        "### Reward Formula\n"
+        f"- `reward = (0.5 x {result.ml_quality_score:.2f}) + (0.3 x {result.lint_score:.2f}) - (0.2 x {result.complexity_penalty:.2f})`\n"
+        f"- **Final reward:** `{result.reward_score:.2f}`\n\n"
+        "### Extracted Signals\n"
+        f"{signal_lines}\n\n"
+        "### Backend Notes\n"
+        f"{notes}"
+    )
+def analyze_inputs(code: str, traceback_text: str, context_window: str) -> tuple[str, str, str, str, str]:
+    """Run the triage engine and format outputs for the Gradio UI."""
+    result = get_default_engine().triage(code or "", traceback_text or "", context_window or "")
+    return (
+        _summary_html(result),
+        _radar_html(result),
+        _plan_markdown(result),
+        _match_markdown(result),
+        _model_markdown(result),
+    )
+def load_example(example_key: str) -> tuple[str, str, str, str, str, str, str, str, str]:
+    """Populate the UI from a built-in example and immediately analyze it."""
+    example = get_default_engine().example_map()[example_key]
+    outputs = analyze_inputs(example.code, example.traceback_text, example.context_window)
+    header = (
+        f"### Example Scenario\n"
+        f"**{example.title}**  \n"
+        f"{example.summary}  \n"
+        f"Label target: `{example.label}`"
+    )
+    return (example.code, example.traceback_text, example.context_window, header, *outputs)
+def build_demo() -> gr.Blocks:
+    """Create the TorchReview Copilot Gradio application."""
+    examples = get_default_engine().example_map()
+    first_example = next(iter(examples.values()))
+    with gr.Blocks(theme=gr.themes.Soft(primary_hue="orange", secondary_hue="amber"), css=CSS, title="TorchReview Copilot") as demo:
+        gr.HTML(
+            """
+            <div class="hero-card">
+              <div class="eyebrow">Meta PyTorch OpenEnv Hackathon Demo</div>
+              <h1 class="hero-title">TorchReview Copilot</h1>
+              <p class="hero-copy">
+                AI-powered code review and improvement system using PyTorch to score code quality, surface bugs,
+                and generate a three-step improvement plan. OpenEnv stays underneath as the deterministic validation engine.
+              </p>
+            </div>
+            """
+        )
+        with gr.Row():
+            with gr.Column(scale=6):
+                example_choice = gr.Radio(
+                    choices=[(item.title, item.key) for item in examples.values()],
+                    value=first_example.key,
+                    label="Try a built-in failure scenario",
+                    info="Switching examples updates the Live Triage Radar immediately.",
+                )
+                example_header = gr.Markdown()
+                code_input = gr.Code(
+                    value=first_example.code,
+                    language="python",
+                    lines=18,
+                    label="Python code under review",
+                )
+                traceback_input = gr.Textbox(
+                    value=first_example.traceback_text,
+                    lines=7,
+                    label="Optional traceback / failing test output",
+                    placeholder="Paste stack traces, assertion failures, or benchmark notes here.",
+                )
+                context_input = gr.Textbox(
+                    value=first_example.context_window,
+                    lines=4,
+                    label="Context window",
+                    placeholder="Describe expected behavior, constraints, or repository context.",
+                )
+                with gr.Row():
+                    analyze_button = gr.Button("Analyze & Score Code", variant="primary")
+                    clear_button = gr.Button("Clear Inputs", variant="secondary")
+            with gr.Column(scale=5):
+                summary_html = gr.HTML()
+                radar_html = gr.HTML()
+                plan_markdown = gr.Markdown()
+                match_markdown = gr.Markdown()
+                model_markdown = gr.Markdown()
+        gr.HTML(
+            """
+            <div class="subtle-card" style="margin-top: 12px;">
+              <div class="eyebrow">How It Works</div>
+              <div class="how-grid">
+                <div class="how-step"><strong>Input</strong><br>Code plus optional traceback or benchmark signal.</div>
+                <div class="how-step"><strong>Processing</strong><br>Static checks extract parser, lint, complexity, and runtime clues.</div>
+                <div class="how-step"><strong>Model</strong><br>CodeBERTa embeddings run through PyTorch and score code quality against known OpenEnv patterns.</div>
+                <div class="how-step"><strong>Output</strong><br>Confidence radar, reward score, and a three-step improvement plan.</div>
+              </div>
+            </div>
+            """
+        )
+        example_choice.change(
+            fn=load_example,
+            inputs=example_choice,
+            outputs=[code_input, traceback_input, context_input, example_header, summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
+            show_progress="hidden",
+        )
+        analyze_button.click(
+            fn=analyze_inputs,
+            inputs=[code_input, traceback_input, context_input],
+            outputs=[summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
+            show_progress="minimal",
+        )
+        clear_button.click(
+            fn=lambda: ("", "", "", "### Example Scenario\nChoose a built-in example or paste custom code.", *_default_outputs()),
+            inputs=None,
+            outputs=[code_input, traceback_input, context_input, example_header, summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
+            show_progress="hidden",
+        )
+        demo.load(
+            fn=load_example,
+            inputs=example_choice,
+            outputs=[code_input, traceback_input, context_input, example_header, summary_html, radar_html, plan_markdown, match_markdown, model_markdown],
+            show_progress="hidden",
+        )
+    return demo

server/env.py CHANGED Viewed

@@ -11,7 +11,7 @@ from openenv.core.env_server.types import EnvironmentMetadata
 try:
     from ..graders import grade_task
     from ..graders.shared import component_score, safe_ratio, strict_score
-    from ..models import (
         HistoryEntry,
         PythonCodeReviewAction,
         PythonCodeReviewObservation,
@@ -23,7 +23,7 @@ try:
 except ImportError:
     from graders import grade_task
     from graders.shared import component_score, safe_ratio, strict_score
-    from models import (
         HistoryEntry,
         PythonCodeReviewAction,
         PythonCodeReviewObservation,

 try:
     from ..graders import grade_task
     from ..graders.shared import component_score, safe_ratio, strict_score
+    from ..Models import (
         HistoryEntry,
         PythonCodeReviewAction,
         PythonCodeReviewObservation,
 except ImportError:
     from graders import grade_task
     from graders.shared import component_score, safe_ratio, strict_score
+    from Models import (
         HistoryEntry,
         PythonCodeReviewAction,
         PythonCodeReviewObservation,

server/requirements.txt CHANGED Viewed

@@ -1,5 +1,9 @@
 openenv-core[core]>=0.2.2
 fastapi>=0.111.0
 uvicorn>=0.30.0
 pytest>=8.0.0
 openai>=1.76.0

 openenv-core[core]>=0.2.2
 fastapi>=0.111.0
+gradio>=5.26.0
 uvicorn>=0.30.0
 pytest>=8.0.0
 openai>=1.76.0
+streamlit>=1.44.0
+torch>=2.2.0
+transformers>=4.45.0

services/__init__.py ADDED Viewed

	@@ -0,0 +1,7 @@

+"""Service layer for orchestrating analysis, suggestions, and rewards."""
+from .analysis_service import AnalysisService
+from .reward_service import RewardService
+from .suggestion_service import SuggestionService
+__all__ = ["AnalysisService", "RewardService", "SuggestionService"]

services/analysis_service.py ADDED Viewed

	@@ -0,0 +1,133 @@

+"""Orchestration layer for multi-domain code analysis."""
+from __future__ import annotations
+import time
+from typing import Any, Callable, Dict
+from analyzers import analyze_data_science_code, analyze_dsa_code, analyze_ml_code, analyze_web_code
+from models import PyTorchCodeAnalyzerModel
+from schemas.request import AnalyzeCodeRequest
+from schemas.response import AnalyzeCodeResponse, DomainAnalysis, StaticAnalysisSummary
+from services.reward_service import RewardService
+from services.suggestion_service import SuggestionService
+from utils import estimate_complexity, parse_code_structure
+def _lint_score(parsed: Dict[str, Any]) -> float:
+    """Convert structural smells into a normalized lint-style score."""
+    score = 1.0
+    if not parsed.get("syntax_valid", True):
+        score -= 0.45
+    score -= min(parsed.get("long_lines", 0), 5) * 0.03
+    if parsed.get("tabs_used"):
+        score -= 0.1
+    if parsed.get("trailing_whitespace_lines"):
+        score -= 0.05
+    if parsed.get("docstring_ratio", 0.0) == 0.0 and parsed.get("function_names"):
+        score -= 0.08
+    return round(max(0.0, min(1.0, score)), 4)
+class AnalysisService:
+    """End-to-end analysis pipeline shared by API and UI."""
+    def __init__(self) -> None:
+        self.model = PyTorchCodeAnalyzerModel()
+        self.reward_service = RewardService()
+        self.suggestion_service = SuggestionService()
+        self._analyzers: Dict[str, Callable[[str, Dict[str, Any], Dict[str, Any]], DomainAnalysis]] = {
+            "dsa": analyze_dsa_code,
+            "data_science": analyze_data_science_code,
+            "ml_dl": analyze_ml_code,
+            "web": analyze_web_code,
+        }
+    def _heuristic_domain_scores(self, parsed: Dict[str, Any], code: str) -> Dict[str, float]:
+        """Derive domain priors from imports and syntax-level hints."""
+        scores = {
+            "dsa": 0.2 + (0.15 if parsed.get("uses_recursion") else 0.0) + (0.15 if parsed.get("max_loop_depth", 0) >= 1 else 0.0),
+            "data_science": 0.2 + (0.35 if parsed.get("uses_pandas") or parsed.get("uses_numpy") else 0.0),
+            "ml_dl": 0.2 + (0.35 if parsed.get("uses_torch") or parsed.get("uses_sklearn") else 0.0),
+            "web": 0.2 + (0.35 if parsed.get("uses_fastapi") or parsed.get("uses_flask") else 0.0) + (0.1 if parsed.get("route_decorators") else 0.0),
+            "general": 0.2,
+        }
+        if "fastapi" in code.lower():
+            scores["web"] += 0.1
+        if "pandas" in code.lower() or "numpy" in code.lower():
+            scores["data_science"] += 0.1
+        if "torch" in code.lower():
+            scores["ml_dl"] += 0.1
+        if "while" in code or "for" in code:
+            scores["dsa"] += 0.05
+        return {key: round(min(value, 0.99), 4) for key, value in scores.items()}
+    def analyze(self, request: AnalyzeCodeRequest) -> AnalyzeCodeResponse:
+        """Run the complete multi-domain analysis pipeline."""
+        started = time.perf_counter()
+        parsed = parse_code_structure(request.code)
+        complexity = estimate_complexity(parsed, request.code)
+        model_prediction = self.model.predict(request.code, request.context_window, parsed)
+        heuristic_scores = self._heuristic_domain_scores(parsed, request.code)
+        combined_scores = {}
+        for domain, heuristic_score in heuristic_scores.items():
+            model_score = float(model_prediction["domain_scores"].get(domain, 0.2))
+            combined_scores[domain] = round((0.6 * model_score) + (0.4 * heuristic_score), 4)
+        detected_domain = request.domain_hint if request.domain_hint != "auto" else max(combined_scores, key=combined_scores.get)
+        analyzer = self._analyzers.get(detected_domain)
+        domain_analysis = (
+            analyzer(request.code, parsed, complexity)
+            if analyzer is not None
+            else DomainAnalysis(
+                domain="general",
+                domain_score=0.6,
+                issues=[],
+                suggestions=["Add stronger domain-specific context for deeper analysis."],
+                highlights={},
+            )
+        )
+        lint_score = _lint_score(parsed)
+        score_breakdown = self.reward_service.compute(
+            ml_score=float(model_prediction["ml_quality_score"]),
+            domain_score=domain_analysis.domain_score,
+            lint_score=lint_score,
+            complexity_penalty=float(complexity["complexity_penalty"]),
+        )
+        static_analysis = StaticAnalysisSummary(
+            syntax_valid=bool(parsed["syntax_valid"]),
+            syntax_error=str(parsed["syntax_error"]),
+            cyclomatic_complexity=int(complexity["cyclomatic_complexity"]),
+            line_count=int(parsed["line_count"]),
+            max_loop_depth=int(parsed["max_loop_depth"]),
+            time_complexity=str(complexity["time_complexity"]),
+            space_complexity=str(complexity["space_complexity"]),
+            detected_imports=list(parsed["imports"]),
+            code_smells=list(parsed["code_smells"]),
+        )
+        improvement_plan = self.suggestion_service.build_improvement_plan(
+            domain_analysis=domain_analysis,
+            static_analysis=static_analysis,
+        )
+        summary = (
+            f"Detected `{detected_domain}` code with a model score of {score_breakdown.ml_score:.0%}, "
+            f"domain score {score_breakdown.domain_score:.0%}, and final reward {score_breakdown.reward:.0%}."
+        )
+        return AnalyzeCodeResponse(
+            detected_domain=detected_domain,  # type: ignore[arg-type]
+            domain_confidences=combined_scores,
+            score_breakdown=score_breakdown,
+            static_analysis=static_analysis,
+            domain_analysis=domain_analysis,
+            improvement_plan=improvement_plan,
+            model_backend=str(model_prediction["backend_name"]),
+            model_id=str(model_prediction["model_id"]),
+            summary=summary,
+            context_window=request.context_window,
+            analysis_time_ms=round((time.perf_counter() - started) * 1000.0, 2),
+        )

services/reward_service.py ADDED Viewed

	@@ -0,0 +1,27 @@

+"""Reward shaping logic for RL-ready code analysis scores."""
+from __future__ import annotations
+from schemas.response import ScoreBreakdown
+class RewardService:
+    """Compute reward scores from model, domain, lint, and complexity signals."""
+    def compute(self, *, ml_score: float, domain_score: float, lint_score: float, complexity_penalty: float) -> ScoreBreakdown:
+        """Apply the weighted reward formula and clamp the result."""
+        reward = max(
+            0.0,
+            min(
+                1.0,
+                (0.4 * ml_score) + (0.2 * domain_score) + (0.2 * lint_score) - (0.2 * complexity_penalty),
+            ),
+        )
+        return ScoreBreakdown(
+            ml_score=round(ml_score, 4),
+            domain_score=round(domain_score, 4),
+            lint_score=round(lint_score, 4),
+            complexity_penalty=round(complexity_penalty, 4),
+            reward=round(reward, 4),
+        )

services/suggestion_service.py ADDED Viewed

	@@ -0,0 +1,28 @@

+"""Suggestion and improvement-plan generation for analyzed code."""
+from __future__ import annotations
+from schemas.response import DomainAnalysis, StaticAnalysisSummary
+class SuggestionService:
+    """Build high-signal improvement steps from analysis output."""
+    def build_improvement_plan(self, *, domain_analysis: DomainAnalysis, static_analysis: StaticAnalysisSummary) -> list[str]:
+        """Return a compact three-step plan optimized for developer action."""
+        primary_issue = (
+            domain_analysis.issues[0].description
+            if domain_analysis.issues
+            else "Stabilize correctness first and keep the public behavior explicit."
+        )
+        step_one = f"Step 1 - Correctness and safety: {primary_issue}"
+        step_two = "Step 2 - Edge cases: test empty inputs, boundary values, malformed payloads, and failure-mode behavior explicitly."
+        step_three = "Step 3 - Scalability: reduce repeated scans, lower cyclomatic complexity, and benchmark the path on realistic input sizes."
+        if domain_analysis.suggestions:
+            step_three = f"{step_three} Priority hint: {domain_analysis.suggestions[0]}"
+        if not static_analysis.syntax_valid:
+            step_one = f"Step 1 - Correctness and safety: fix the syntax error first ({static_analysis.syntax_error})."
+        return [step_one, step_two, step_three]

tests/test_multi_domain_platform.py ADDED Viewed

	@@ -0,0 +1,52 @@

+from __future__ import annotations
+from fastapi.testclient import TestClient
+from api.main import app
+from schemas.request import AnalyzeCodeRequest
+from services.analysis_service import AnalysisService
+def test_analysis_service_detects_web_code() -> None:
+    service = AnalysisService()
+    request = AnalyzeCodeRequest(
+        code="from fastapi import FastAPI\napp = FastAPI()\n\n@app.get('/health')\ndef health():\n    return {'status': 'ok'}\n",
+        domain_hint="auto",
+    )
+    result = service.analyze(request)
+    assert result.detected_domain == "web"
+    assert 0.0 <= result.score_breakdown.reward <= 1.0
+    assert len(result.improvement_plan) == 3
+def test_analysis_service_detects_dsa_code() -> None:
+    service = AnalysisService()
+    request = AnalyzeCodeRequest(
+        code="def has_pair(nums, target):\n    for i in range(len(nums)):\n        for j in range(i + 1, len(nums)):\n            if nums[i] + nums[j] == target:\n                return True\n    return False\n",
+        domain_hint="auto",
+    )
+    result = service.analyze(request)
+    assert result.detected_domain == "dsa"
+    assert result.static_analysis.time_complexity in {"O(n^2)", "O(n^3)"}
+def test_api_analyze_endpoint_returns_valid_payload() -> None:
+    client = TestClient(app)
+    response = client.post(
+        "/analyze",
+        json={
+            "code": "import torch\n\ndef predict(model, x):\n    return model(x)\n",
+            "context_window": "Inference helper for a classifier",
+            "traceback_text": "",
+            "domain_hint": "auto",
+        },
+    )
+    assert response.status_code == 200
+    payload = response.json()
+    assert "detected_domain" in payload
+    assert "score_breakdown" in payload

tests/test_scoring.py CHANGED Viewed

@@ -1,7 +1,7 @@
 from __future__ import annotations
 from graders import grade_task
-from models import PythonCodeReviewAction
 from server.env import PythonCodeReviewEnvironment
 from tasks import list_tasks

 from __future__ import annotations
 from graders import grade_task
+from Models import PythonCodeReviewAction
 from server.env import PythonCodeReviewEnvironment
 from tasks import list_tasks

tests/test_triage_pipeline.py ADDED Viewed

	@@ -0,0 +1,46 @@

+from __future__ import annotations
+from fastapi.testclient import TestClient
+from triage import CodeTriageEngine, HashingEmbeddingBackend
+from triage_catalog import build_examples
+def test_hashing_backend_returns_normalized_embeddings() -> None:
+    backend = HashingEmbeddingBackend(dimensions=32)
+    embeddings = backend.embed_texts(["def foo():\n    return 1", "for x in items:\n    pass"])
+    assert embeddings.shape == (2, 32)
+    for row in embeddings:
+        assert round(float(row.norm().item()), 5) == 1.0
+def test_examples_map_to_expected_labels_with_fallback_backend() -> None:
+    examples = build_examples()
+    engine = CodeTriageEngine(backend=HashingEmbeddingBackend())
+    for example in examples:
+        result = engine.triage(example.code, example.traceback_text, example.context_window)
+        assert result.issue_label == example.label
+        assert 0.0 <= result.reward_score <= 1.0
+def test_syntax_example_exposes_parser_signal() -> None:
+    example = next(item for item in build_examples() if item.label == "syntax")
+    engine = CodeTriageEngine(backend=HashingEmbeddingBackend())
+    result = engine.triage(example.code, example.traceback_text, example.context_window)
+    assert any(signal.name == "syntax_parse" and signal.value == "fails" for signal in result.extracted_signals)
+    assert result.matched_pattern.task_id == example.task_id
+    assert result.repair_plan[0].startswith("Step 1 - Syntax checking and bug fixes")
+def test_composed_app_preserves_health_route() -> None:
+    from server.app import build_application
+    client = TestClient(build_application())
+    response = client.get("/health")
+    assert response.status_code == 200
+    assert response.json()["status"] == "ok"

triage.py ADDED Viewed

	@@ -0,0 +1,473 @@

+"""PyTorch-backed triage pipeline for TorchReview Copilot."""
+from __future__ import annotations
+import ast
+import hashlib
+import os
+import re
+import time
+from functools import lru_cache
+from typing import List, Sequence
+import torch
+import torch.nn.functional as F
+try:
+    from transformers import AutoModel, AutoTokenizer
+except Exception:
+    AutoModel = None  # type: ignore[assignment]
+    AutoTokenizer = None  # type: ignore[assignment]
+try:
+    from .triage_catalog import build_examples, build_prototypes
+    from .triage_models import (
+        IssueLabel,
+        PrototypeMatch,
+        TriageExample,
+        TriagePrototype,
+        TriageResult,
+        TriageSignal,
+    )
+except ImportError:
+    from triage_catalog import build_examples, build_prototypes
+    from triage_models import (
+        IssueLabel,
+        PrototypeMatch,
+        TriageExample,
+        TriagePrototype,
+        TriageResult,
+        TriageSignal,
+    )
+MODEL_ID = os.getenv("TRIAGE_MODEL_ID", "huggingface/CodeBERTa-small-v1")
+MODEL_MAX_LENGTH = int(os.getenv("TRIAGE_MODEL_MAX_LENGTH", "256"))
+LABELS: tuple[IssueLabel, ...] = ("syntax", "logic", "performance")
+class _LoopDepthVisitor(ast.NodeVisitor):
+    """Track the maximum loop nesting depth in a code snippet."""
+    def __init__(self) -> None:
+        self.depth = 0
+        self.max_depth = 0
+    def _visit_loop(self, node: ast.AST) -> None:
+        self.depth += 1
+        self.max_depth = max(self.max_depth, self.depth)
+        self.generic_visit(node)
+        self.depth -= 1
+    def visit_For(self, node: ast.For) -> None:  # noqa: N802
+        self._visit_loop(node)
+    def visit_While(self, node: ast.While) -> None:  # noqa: N802
+        self._visit_loop(node)
+    def visit_comprehension(self, node: ast.comprehension) -> None:  # noqa: N802
+        self._visit_loop(node)
+class HashingEmbeddingBackend:
+    """Deterministic torch-native fallback when pretrained weights are unavailable."""
+    def __init__(self, dimensions: int = 96) -> None:
+        self.dimensions = dimensions
+        self.model_id = "hashed-token-fallback"
+        self.backend_name = "hashed-token-fallback"
+        self.notes = ["Using hashed torch embeddings because pretrained weights are unavailable."]
+    def embed_texts(self, texts: Sequence[str]) -> torch.Tensor:
+        rows = torch.zeros((len(texts), self.dimensions), dtype=torch.float32)
+        for row_index, text in enumerate(texts):
+            tokens = re.findall(r"[A-Za-z_]+|\d+|==|!=|<=|>=|\S", text.lower())[:512]
+            if not tokens:
+                rows[row_index, 0] = 1.0
+                continue
+            for token in tokens:
+                digest = hashlib.md5(token.encode("utf-8")).hexdigest()
+                bucket = int(digest[:8], 16) % self.dimensions
+                sign = -1.0 if int(digest[8:10], 16) % 2 else 1.0
+                rows[row_index, bucket] += sign
+        return F.normalize(rows + 1e-6, dim=1)
+class TransformersEmbeddingBackend:
+    """Mean-pool CodeBERTa embeddings via torch + transformers."""
+    def __init__(self, model_id: str = MODEL_ID, force_fallback: bool = False) -> None:
+        self.model_id = model_id
+        self.force_fallback = force_fallback
+        self.backend_name = model_id
+        self.notes: List[str] = []
+        self._fallback = HashingEmbeddingBackend()
+        self._tokenizer = None
+        self._model = None
+        self._load_error = ""
+        if force_fallback:
+            self.backend_name = self._fallback.backend_name
+            self.notes = list(self._fallback.notes)
+    def _ensure_loaded(self) -> None:
+        if self.force_fallback or self._model is not None or self._load_error:
+            return
+        if AutoTokenizer is None or AutoModel is None:
+            self._load_error = "transformers is not installed."
+        else:
+            try:
+                self._tokenizer = AutoTokenizer.from_pretrained(self.model_id)
+                self._model = AutoModel.from_pretrained(self.model_id)
+                self._model.eval()
+                self.notes.append(f"Loaded pretrained encoder `{self.model_id}` for inference.")
+            except Exception as exc:
+                self._load_error = f"{type(exc).__name__}: {exc}"
+        if self._load_error:
+            self.backend_name = self._fallback.backend_name
+            self.notes = list(self._fallback.notes) + [f"Pretrained load failed: {self._load_error}"]
+    def embed_texts(self, texts: Sequence[str]) -> torch.Tensor:
+        self._ensure_loaded()
+        if self._model is None or self._tokenizer is None:
+            return self._fallback.embed_texts(texts)
+        encoded = self._tokenizer(
+            list(texts),
+            padding=True,
+            truncation=True,
+            max_length=MODEL_MAX_LENGTH,
+            return_tensors="pt",
+        )
+        with torch.no_grad():
+            outputs = self._model(**encoded)
+            hidden_state = outputs.last_hidden_state
+            mask = encoded["attention_mask"].unsqueeze(-1)
+            pooled = (hidden_state * mask).sum(dim=1) / mask.sum(dim=1).clamp(min=1)
+        return F.normalize(pooled, dim=1)
+def _sanitize_text(value: str) -> str:
+    text = (value or "").strip()
+    return text[:4000]
+def _safe_softmax(scores: dict[IssueLabel, float]) -> dict[str, float]:
+    tensor = torch.tensor([scores[label] for label in LABELS], dtype=torch.float32)
+    probabilities = torch.softmax(tensor * 4.0, dim=0)
+    return {label: round(float(probabilities[index]), 4) for index, label in enumerate(LABELS)}
+def _loop_depth(code: str) -> int:
+    try:
+        tree = ast.parse(code)
+    except SyntaxError:
+        return 0
+    visitor = _LoopDepthVisitor()
+    visitor.visit(tree)
+    return visitor.max_depth
+def _repair_risk(label: IssueLabel, confidence: float, signal_count: int) -> str:
+    base = {"syntax": 0.25, "logic": 0.55, "performance": 0.7}[label]
+    if confidence < 0.55:
+        base += 0.12
+    if signal_count >= 4:
+        base += 0.08
+    if base < 0.4:
+        return "low"
+    if base < 0.72:
+        return "medium"
+    return "high"
+def _clamp_unit(value: float) -> float:
+    return round(max(0.0, min(1.0, float(value))), 4)
+def _lint_score(code: str) -> float:
+    stripped_lines = [line.rstrip("\n") for line in code.splitlines()]
+    if not stripped_lines:
+        return 0.2
+    score = 1.0
+    if any(len(line) > 88 for line in stripped_lines):
+        score -= 0.15
+    if any(line.rstrip() != line for line in stripped_lines):
+        score -= 0.1
+    if any("\t" in line for line in stripped_lines):
+        score -= 0.1
+    try:
+        tree = ast.parse(code)
+        functions = [node for node in tree.body if isinstance(node, ast.FunctionDef)]
+        if functions and not ast.get_docstring(functions[0]):
+            score -= 0.08
+    except SyntaxError:
+        score -= 0.45
+    return _clamp_unit(score)
+def _complexity_penalty(code: str) -> float:
+    try:
+        tree = ast.parse(code)
+    except SyntaxError:
+        return 0.95
+    branch_nodes = sum(isinstance(node, (ast.If, ast.For, ast.While, ast.Try, ast.Match)) for node in ast.walk(tree))
+    loop_depth = _loop_depth(code)
+    penalty = 0.1 + min(branch_nodes, 8) * 0.07 + min(loop_depth, 4) * 0.12
+    return _clamp_unit(penalty)
+class CodeTriageEngine:
+    """Combine static signals with PyTorch embeddings to classify code issues."""
+    def __init__(
+        self,
+        *,
+        backend: TransformersEmbeddingBackend | HashingEmbeddingBackend | None = None,
+        prototypes: Sequence[TriagePrototype] | None = None,
+        examples: Sequence[TriageExample] | None = None,
+    ) -> None:
+        self.backend = backend or TransformersEmbeddingBackend()
+        self.prototypes = list(prototypes or build_prototypes())
+        self.examples = list(examples or build_examples())
+        self._prototype_matrix: torch.Tensor | None = None
+        self._reference_code_matrix: torch.Tensor | None = None
+    def example_map(self) -> dict[str, TriageExample]:
+        """Return UI examples keyed by task id."""
+        return {example.key: example for example in self.examples}
+    def _build_document(self, code: str, traceback_text: str) -> str:
+        trace = _sanitize_text(traceback_text) or "No traceback supplied."
+        snippet = _sanitize_text(code) or "# No code supplied."
+        return f"Candidate code:\n{snippet}\n\nObserved failure:\n{trace}\n"
+    def _build_review_document(self, code: str, traceback_text: str, context_window: str) -> str:
+        context = _sanitize_text(context_window) or "No additional context window supplied."
+        return (
+            f"{self._build_document(code, traceback_text)}\n"
+            f"Context window:\n{context}\n"
+        )
+    def _prototype_embeddings(self) -> torch.Tensor:
+        if self._prototype_matrix is None:
+            reference_texts = [prototype.reference_text for prototype in self.prototypes]
+            self._prototype_matrix = self.backend.embed_texts(reference_texts)
+        return self._prototype_matrix
+    def _reference_code_embeddings(self) -> torch.Tensor:
+        if self._reference_code_matrix is None:
+            reference_codes = [prototype.reference_code for prototype in self.prototypes]
+            self._reference_code_matrix = self.backend.embed_texts(reference_codes)
+        return self._reference_code_matrix
+    def _extract_signals(self, code: str, traceback_text: str) -> tuple[list[TriageSignal], dict[IssueLabel, float], list[str]]:
+        trace = (traceback_text or "").lower()
+        heuristic_scores: dict[IssueLabel, float] = {label: 0.15 for label in LABELS}
+        signals: list[TriageSignal] = []
+        notes: list[str] = []
+        try:
+            ast.parse(code)
+            signals.append(
+                TriageSignal(
+                    name="syntax_parse",
+                    value="passes",
+                    impact="syntax",
+                    weight=0.1,
+                    evidence="Python AST parsing succeeded.",
+                )
+            )
+            heuristic_scores["logic"] += 0.05
+        except SyntaxError as exc:
+            evidence = f"{exc.msg} at line {exc.lineno}"
+            signals.append(
+                TriageSignal(
+                    name="syntax_parse",
+                    value="fails",
+                    impact="syntax",
+                    weight=0.95,
+                    evidence=evidence,
+                )
+            )
+            heuristic_scores["syntax"] += 0.85
+            notes.append(f"Parser failure detected: {evidence}")
+        if any(token in trace for token in ("syntaxerror", "indentationerror", "expected ':'")):
+            signals.append(
+                TriageSignal(
+                    name="traceback_keyword",
+                    value="syntaxerror",
+                    impact="syntax",
+                    weight=0.8,
+                    evidence="Traceback contains a parser error.",
+                )
+            )
+            heuristic_scores["syntax"] += 0.55
+        if any(token in trace for token in ("assertionerror", "expected:", "actual:", "boundary", "missing", "incorrect")):
+            signals.append(
+                TriageSignal(
+                    name="test_failure_signal",
+                    value="assertion-style failure",
+                    impact="logic",
+                    weight=0.7,
+                    evidence="Failure text points to behavioral mismatch instead of parser issues.",
+                )
+            )
+            heuristic_scores["logic"] += 0.55
+        if any(token in trace for token in ("timeout", "benchmark", "slow", "latency", "performance", "profiler")):
+            signals.append(
+                TriageSignal(
+                    name="performance_trace",
+                    value="latency regression",
+                    impact="performance",
+                    weight=0.85,
+                    evidence="Traceback mentions benchmark or latency pressure.",
+                )
+            )
+            heuristic_scores["performance"] += 0.7
+        loop_depth = _loop_depth(code)
+        if loop_depth >= 2:
+            signals.append(
+                TriageSignal(
+                    name="loop_depth",
+                    value=str(loop_depth),
+                    impact="performance",
+                    weight=0.65,
+                    evidence="Nested iteration increases runtime risk on larger fixtures.",
+                )
+            )
+            heuristic_scores["performance"] += 0.35
+        if "Counter(" in code or "defaultdict(" in code or "set(" in code:
+            heuristic_scores["performance"] += 0.05
+        if "return sessions" in code and "sessions.append" not in code:
+            signals.append(
+                TriageSignal(
+                    name="state_update_gap",
+                    value="possible missing final append",
+                    impact="logic",
+                    weight=0.45,
+                    evidence="A collection is returned without an obvious final state flush.",
+                )
+            )
+            heuristic_scores["logic"] += 0.18
+        return signals, heuristic_scores, notes
+    def _nearest_match(self, embedding: torch.Tensor) -> tuple[TriagePrototype, float, dict[str, float]]:
+        similarities = torch.matmul(embedding, self._prototype_embeddings().T)[0]
+        indexed_scores = {
+            self.prototypes[index].task_id: round(float((similarities[index] + 1.0) / 2.0), 4)
+            for index in range(len(self.prototypes))
+        }
+        best_index = int(torch.argmax(similarities).item())
+        best_prototype = self.prototypes[best_index]
+        best_similarity = float((similarities[best_index] + 1.0) / 2.0)
+        return best_prototype, best_similarity, indexed_scores
+    def _repair_plan(self, label: IssueLabel, matched: TriagePrototype, context_window: str) -> list[str]:
+        context = _sanitize_text(context_window)
+        step_one = {
+            "syntax": "Step 1 - Syntax checking and bug fixes: resolve the parser break before touching behavior, then align the function with the expected contract.",
+            "logic": "Step 1 - Syntax checking and bug fixes: confirm the code parses cleanly, then patch the failing branch or state update causing the incorrect result.",
+            "performance": "Step 1 - Syntax checking and bug fixes: keep the implementation correct first, then isolate the slow section without changing external behavior.",
+        }[label]
+        step_two = (
+            "Step 2 - Edge case handling: verify empty input, boundary values, missing fields, and final-state flush behavior "
+            f"against the known pattern `{matched.title}`."
+        )
+        step_three = (
+            "Step 3 - Scalability of code: remove repeated full scans, prefer linear-time data structures, "
+            "and benchmark the path on a production-like fixture."
+        )
+        if context:
+            step_two = f"{step_two} Context window to preserve: {context}"
+        return [step_one, step_two, step_three]
+    def _reference_quality_score(self, code: str, matched: TriagePrototype) -> float:
+        candidate = self.backend.embed_texts([_sanitize_text(code) or "# empty"])
+        match_index = next(index for index, prototype in enumerate(self.prototypes) if prototype.task_id == matched.task_id)
+        reference = self._reference_code_embeddings()[match_index : match_index + 1]
+        score = float(torch.matmul(candidate, reference.T)[0][0].item())
+        return _clamp_unit((score + 1.0) / 2.0)
+    def triage(self, code: str, traceback_text: str = "", context_window: str = "") -> TriageResult:
+        """Run the full triage pipeline on code plus optional failure context."""
+        started = time.perf_counter()
+        document = self._build_review_document(code, traceback_text, context_window)
+        signals, heuristic_scores, notes = self._extract_signals(code, traceback_text)
+        candidate_embedding = self.backend.embed_texts([document])
+        matched, matched_similarity, prototype_scores = self._nearest_match(candidate_embedding)
+        label_similarity = {label: 0.18 for label in LABELS}
+        for prototype in self.prototypes:
+            label_similarity[prototype.label] = max(
+                label_similarity[prototype.label],
+                prototype_scores[prototype.task_id],
+            )
+        combined_scores = {
+            label: 0.72 * label_similarity[label] + 0.28 * heuristic_scores[label]
+            for label in LABELS
+        }
+        confidence_scores = _safe_softmax(combined_scores)
+        issue_label = max(LABELS, key=lambda label: confidence_scores[label])
+        top_confidence = confidence_scores[issue_label]
+        top_signal = signals[0].evidence if signals else "Model similarity dominated the decision."
+        ml_quality_score = self._reference_quality_score(code, matched)
+        lint_score = _lint_score(code)
+        complexity_penalty = _complexity_penalty(code)
+        reward_score = _clamp_unit((0.5 * ml_quality_score) + (0.3 * lint_score) - (0.2 * complexity_penalty))
+        summary = (
+            f"Detected a {issue_label} issue with {top_confidence:.0%} confidence. "
+            f"The closest known failure pattern is `{matched.title}`, which indicates {matched.summary.lower()}. "
+            f"Predicted quality score is {ml_quality_score:.0%} with an RL-ready reward of {reward_score:.0%}."
+        )
+        suggested_next_action = {
+            "syntax": "Fix the parser error first, then rerun validation before changing behavior.",
+            "logic": "Step through the smallest failing case and confirm the final branch/update behavior.",
+            "performance": "Replace repeated full-list scans with a linear-time aggregation strategy, then benchmark it.",
+        }[issue_label]
+        return TriageResult(
+            issue_label=issue_label,
+            confidence_scores=confidence_scores,
+            repair_risk=_repair_risk(issue_label, top_confidence, len(signals)),
+            ml_quality_score=ml_quality_score,
+            lint_score=lint_score,
+            complexity_penalty=complexity_penalty,
+            reward_score=reward_score,
+            summary=summary,
+            matched_pattern=PrototypeMatch(
+                task_id=matched.task_id,
+                title=matched.title,
+                label=matched.label,
+                similarity=round(matched_similarity, 4),
+                summary=matched.summary,
+                rationale=top_signal,
+            ),
+            repair_plan=self._repair_plan(issue_label, matched, context_window),
+            suggested_next_action=suggested_next_action,
+            extracted_signals=signals,
+            model_backend=self.backend.backend_name,
+            model_id=self.backend.model_id,
+            inference_notes=list(self.backend.notes) + notes,
+            analysis_time_ms=round((time.perf_counter() - started) * 1000.0, 2),
+        )
+@lru_cache(maxsize=1)
+def get_default_engine() -> CodeTriageEngine:
+    """Return a cached triage engine for the running process."""
+    return CodeTriageEngine()

triage_catalog.py ADDED Viewed

	@@ -0,0 +1,134 @@

+"""Curated prototypes and example inputs for TorchReview Copilot."""
+from __future__ import annotations
+from typing import Dict, List
+try:
+    from .triage_models import IssueLabel, TriageExample, TriagePrototype
+    from .tasks import list_tasks
+except ImportError:
+    from triage_models import IssueLabel, TriageExample, TriagePrototype
+    from tasks import list_tasks
+TASK_KIND_TO_LABEL: Dict[str, IssueLabel] = {
+    "syntax_fix": "syntax",
+    "bug_fix": "logic",
+    "optimization": "performance",
+}
+TRACEBACK_BY_TASK_ID: Dict[str, str] = {
+    "syntax_fix_invoice_totals": (
+        "Traceback (most recent call last):\n"
+        "  File \"services/billing/reconciliation.py\", line 3\n"
+        "    for record in records\n"
+        "                      ^\n"
+        "SyntaxError: expected ':'"
+    ),
+    "bug_fix_session_windows": (
+        "AssertionError: collapse_sessions([{'minute': 1}, {'minute': 3}, {'minute': 8}], 4)\n"
+        "Expected: [(1, 3), (8, 8)]\n"
+        "Actual:   [(1, 8)]\n"
+        "Boundary handling merges the final session instead of starting a new one."
+    ),
+    "optimization_rank_active_users": (
+        "BenchmarkWarning: rank_active_users exceeded the 450ms budget on a nightly export fixture.\n"
+        "Profiler hint: repeated scans over the full event list and nested loops dominate runtime."
+    ),
+}
+SUMMARY_BY_TASK_ID: Dict[str, str] = {
+    "syntax_fix_invoice_totals": "Broken parser state in a billing helper blocks reconciliation jobs.",
+    "bug_fix_session_windows": "Session-boundary logic fails on inclusive idle-timeout edges.",
+    "optimization_rank_active_users": "A nightly ranking job is correct on small fixtures but too slow at production scale.",
+}
+CONTEXT_BY_TASK_ID: Dict[str, str] = {
+    "syntax_fix_invoice_totals": (
+        "Context window: this helper runs in an end-of-day billing reconciliation job. "
+        "Keep the public function signature intact and restore correct totals for mixed integer/string inputs."
+    ),
+    "bug_fix_session_windows": (
+        "Context window: this function groups sorted product analytics events into sessions for retention dashboards. "
+        "Boundary behavior must stay deterministic because downstream reports depend on it."
+    ),
+    "optimization_rank_active_users": (
+        "Context window: this pipeline feeds a nightly export on a small CPU instance. "
+        "Maintain identical output ordering while improving scalability on larger event volumes."
+    ),
+}
+def _prototype_text(
+    task_id: str,
+    title: str,
+    description: str,
+    repo_summary: str,
+    goal: str,
+    visible_tests: List[str],
+    starter_code: str,
+    traceback_text: str,
+) -> str:
+    visible = "\n".join(f"- {item}" for item in visible_tests) or "- none"
+    return (
+        f"Title: {title}\n"
+        f"Problem: {description}\n"
+        f"Repo context: {repo_summary}\n"
+        f"Goal: {goal}\n"
+        f"Observed failure:\n{traceback_text}\n"
+        f"Visible checks:\n{visible}\n"
+        f"Candidate code:\n{starter_code}\n"
+        f"Task id: {task_id}\n"
+    )
+def build_examples() -> List[TriageExample]:
+    """Create stable UI examples from the task catalog."""
+    examples: List[TriageExample] = []
+    for task in list_tasks():
+        label = TASK_KIND_TO_LABEL[task.task_kind]
+        examples.append(
+            TriageExample(
+                key=task.task_id,
+                title=task.title,
+                label=label,
+                summary=SUMMARY_BY_TASK_ID[task.task_id],
+                code=task.starter_code,
+                traceback_text=TRACEBACK_BY_TASK_ID[task.task_id],
+                context_window=CONTEXT_BY_TASK_ID[task.task_id],
+                task_id=task.task_id,
+            )
+        )
+    return examples
+def build_prototypes() -> List[TriagePrototype]:
+    """Build canonical triage prototypes from the OpenEnv tasks."""
+    prototypes: List[TriagePrototype] = []
+    for task in list_tasks():
+        traceback_text = TRACEBACK_BY_TASK_ID[task.task_id]
+        prototypes.append(
+            TriagePrototype(
+                task_id=task.task_id,
+                title=task.title,
+                label=TASK_KIND_TO_LABEL[task.task_kind],
+                summary=SUMMARY_BY_TASK_ID[task.task_id],
+                reference_text=_prototype_text(
+                    task.task_id,
+                    task.title,
+                    task.task_description,
+                    task.repo_summary,
+                    task.goal,
+                    list(task.visible_tests),
+                    task.reference_code,
+                    traceback_text,
+                ),
+                starter_code=task.starter_code,
+                reference_code=task.reference_code,
+                traceback_text=traceback_text,
+            )
+        )
+    return prototypes

triage_models.py ADDED Viewed

	@@ -0,0 +1,79 @@

+"""Typed models for TorchReview Copilot outputs and examples."""
+from __future__ import annotations
+from typing import Dict, List, Literal
+from pydantic import BaseModel, Field
+IssueLabel = Literal["syntax", "logic", "performance"]
+RiskLevel = Literal["low", "medium", "high"]
+class TriageSignal(BaseModel):
+    """One extracted signal used during issue classification."""
+    name: str
+    value: str
+    impact: Literal["syntax", "logic", "performance", "mixed"] = "mixed"
+    weight: float = Field(..., ge=0.0, le=1.0)
+    evidence: str = ""
+class PrototypeMatch(BaseModel):
+    """Nearest known bug pattern from the built-in task catalog."""
+    task_id: str
+    title: str
+    label: IssueLabel
+    similarity: float = Field(..., ge=0.0, le=1.0)
+    summary: str
+    rationale: str
+class TriageExample(BaseModel):
+    """Example payload exposed in the demo UI."""
+    key: str
+    title: str
+    label: IssueLabel
+    summary: str
+    code: str
+    traceback_text: str
+    context_window: str
+    task_id: str
+class TriagePrototype(BaseModel):
+    """Canonical issue-pattern representation embedded by the triage engine."""
+    task_id: str
+    title: str
+    label: IssueLabel
+    summary: str
+    reference_text: str
+    starter_code: str
+    reference_code: str
+    traceback_text: str
+class TriageResult(BaseModel):
+    """Structured output produced by the triage pipeline."""
+    issue_label: IssueLabel
+    confidence_scores: Dict[str, float]
+    repair_risk: RiskLevel
+    ml_quality_score: float = Field(..., ge=0.0, le=1.0)
+    lint_score: float = Field(..., ge=0.0, le=1.0)
+    complexity_penalty: float = Field(..., ge=0.0, le=1.0)
+    reward_score: float = Field(..., ge=0.0, le=1.0)
+    summary: str
+    matched_pattern: PrototypeMatch
+    repair_plan: List[str]
+    suggested_next_action: str
+    extracted_signals: List[TriageSignal] = Field(default_factory=list)
+    model_backend: str
+    model_id: str
+    inference_notes: List[str] = Field(default_factory=list)
+    analysis_time_ms: float = Field(..., ge=0.0)

utils/__init__.py ADDED Viewed

	@@ -0,0 +1,6 @@

+"""Utility helpers for AST parsing and complexity scoring."""
+from .ast_parser import parse_code_structure
+from .complexity import estimate_complexity
+__all__ = ["parse_code_structure", "estimate_complexity"]

utils/ast_parser.py ADDED Viewed

	@@ -0,0 +1,144 @@

+"""Static parsing helpers for multi-domain Python code analysis."""
+from __future__ import annotations
+import ast
+from typing import Any, Dict, List
+class _LoopDepthVisitor(ast.NodeVisitor):
+    """Collect loop nesting depth for a parsed Python module."""
+    def __init__(self) -> None:
+        self.depth = 0
+        self.max_depth = 0
+    def _visit_loop(self, node: ast.AST) -> None:
+        self.depth += 1
+        self.max_depth = max(self.max_depth, self.depth)
+        self.generic_visit(node)
+        self.depth -= 1
+    def visit_For(self, node: ast.For) -> None:  # noqa: N802
+        self._visit_loop(node)
+    def visit_While(self, node: ast.While) -> None:  # noqa: N802
+        self._visit_loop(node)
+    def visit_comprehension(self, node: ast.comprehension) -> None:  # noqa: N802
+        self._visit_loop(node)
+def parse_code_structure(code: str) -> Dict[str, Any]:
+    """Parse Python code into reusable structural signals."""
+    summary: Dict[str, Any] = {
+        "syntax_valid": True,
+        "syntax_error": "",
+        "imports": [],
+        "function_names": [],
+        "class_names": [],
+        "loop_count": 0,
+        "branch_count": 0,
+        "max_loop_depth": 0,
+        "line_count": len(code.splitlines()),
+        "long_lines": 0,
+        "tabs_used": "\t" in code,
+        "trailing_whitespace_lines": 0,
+        "uses_numpy": False,
+        "uses_pandas": False,
+        "uses_torch": False,
+        "uses_sklearn": False,
+        "uses_fastapi": False,
+        "uses_flask": False,
+        "uses_pydantic": False,
+        "uses_recursion": False,
+        "calls_eval": False,
+        "calls_no_grad": False,
+        "calls_backward": False,
+        "calls_optimizer_step": False,
+        "route_decorators": [],
+        "docstring_ratio": 0.0,
+        "code_smells": [],
+    }
+    lines = code.splitlines()
+    summary["long_lines"] = sum(1 for line in lines if len(line) > 88)
+    summary["trailing_whitespace_lines"] = sum(1 for line in lines if line.rstrip() != line)
+    try:
+        tree = ast.parse(code)
+    except SyntaxError as exc:
+        summary["syntax_valid"] = False
+        summary["syntax_error"] = f"{exc.msg} (line {exc.lineno})"
+        summary["code_smells"].append("Code does not parse.")
+        return summary
+    visitor = _LoopDepthVisitor()
+    visitor.visit(tree)
+    summary["max_loop_depth"] = visitor.max_depth
+    functions = [node for node in tree.body if isinstance(node, ast.FunctionDef)]
+    summary["function_names"] = [node.name for node in functions]
+    summary["class_names"] = [node.name for node in tree.body if isinstance(node, ast.ClassDef)]
+    summary["docstring_ratio"] = (
+        sum(1 for node in functions if ast.get_docstring(node)) / len(functions)
+        if functions
+        else 0.0
+    )
+    imports: List[str] = []
+    for node in ast.walk(tree):
+        if isinstance(node, ast.Import):
+            imports.extend(alias.name.split(".")[0] for alias in node.names)
+        elif isinstance(node, ast.ImportFrom) and node.module:
+            imports.append(node.module.split(".")[0])
+        elif isinstance(node, (ast.For, ast.While, ast.comprehension)):
+            summary["loop_count"] += 1
+        elif isinstance(node, (ast.If, ast.Try, ast.Match)):
+            summary["branch_count"] += 1
+        elif isinstance(node, ast.Call) and isinstance(node.func, ast.Attribute):
+            attr = node.func.attr
+            if attr == "eval":
+                summary["calls_eval"] = True
+            elif attr == "backward":
+                summary["calls_backward"] = True
+            elif attr == "step":
+                summary["calls_optimizer_step"] = True
+        elif isinstance(node, ast.Call) and isinstance(node.func, ast.Name) and node.func.id == "print":
+            summary["code_smells"].append("Debug print statements are present.")
+        elif isinstance(node, ast.With):
+            if any(isinstance(item.context_expr, ast.Call) and isinstance(item.context_expr.func, ast.Attribute) and item.context_expr.func.attr == "no_grad" for item in node.items):
+                summary["calls_no_grad"] = True
+    import_set = sorted(set(imports))
+    summary["imports"] = import_set
+    summary["uses_numpy"] = "numpy" in import_set or "np" in code
+    summary["uses_pandas"] = "pandas" in import_set or "pd" in code
+    summary["uses_torch"] = "torch" in import_set
+    summary["uses_sklearn"] = "sklearn" in import_set
+    summary["uses_fastapi"] = "fastapi" in import_set
+    summary["uses_flask"] = "flask" in import_set
+    summary["uses_pydantic"] = "pydantic" in import_set or "BaseModel" in code
+    for node in functions:
+        for child in ast.walk(node):
+            if isinstance(child, ast.Call) and isinstance(child.func, ast.Name) and child.func.id == node.name:
+                summary["uses_recursion"] = True
+    for node in ast.walk(tree):
+        if isinstance(node, ast.FunctionDef):
+            for decorator in node.decorator_list:
+                if isinstance(decorator, ast.Call) and isinstance(decorator.func, ast.Attribute):
+                    summary["route_decorators"].append(decorator.func.attr)
+                elif isinstance(decorator, ast.Attribute):
+                    summary["route_decorators"].append(decorator.attr)
+    if summary["long_lines"]:
+        summary["code_smells"].append("Long lines reduce readability.")
+    if summary["tabs_used"]:
+        summary["code_smells"].append("Tabs detected; prefer spaces for consistency.")
+    if summary["trailing_whitespace_lines"]:
+        summary["code_smells"].append("Trailing whitespace found.")
+    return summary

utils/complexity.py ADDED Viewed

	@@ -0,0 +1,37 @@

+"""Complexity heuristics for DSA-style and general Python code."""
+from __future__ import annotations
+from typing import Any, Dict
+def estimate_complexity(parsed: Dict[str, Any], code: str) -> Dict[str, Any]:
+    """Estimate cyclomatic complexity and rough Big-O heuristics."""
+    cyclomatic = 1 + int(parsed.get("branch_count", 0))
+    loop_depth = int(parsed.get("max_loop_depth", 0))
+    uses_recursion = bool(parsed.get("uses_recursion", False))
+    if loop_depth >= 3:
+        time_complexity = "O(n^3)"
+    elif loop_depth == 2:
+        time_complexity = "O(n^2)"
+    elif "sorted(" in code or ".sort(" in code:
+        time_complexity = "O(n log n)"
+    elif loop_depth == 1 or uses_recursion:
+        time_complexity = "O(n)"
+    else:
+        time_complexity = "O(1)"
+    if "append(" in code or "list(" in code or "dict(" in code or "set(" in code:
+        space_complexity = "O(n)"
+    else:
+        space_complexity = "O(1)"
+    complexity_penalty = min(0.99, 0.08 + (cyclomatic * 0.04) + (loop_depth * 0.12))
+    return {
+        "cyclomatic_complexity": cyclomatic,
+        "time_complexity": time_complexity,
+        "space_complexity": space_complexity,
+        "complexity_penalty": round(complexity_penalty, 4),
+    }