🔥 Roast My Repo

Community Article Published June 13, 2026

Most CS students have the same problem: their GitHub repos look worse than their actual skills. Not because they can't code, but because nobody tells them what's wrong. Your friends won't say it. Your college won't say it. And you don't notice it until a recruiter closes the tab.

Paste a public GitHub URL into Roast My Repo and get a brutal, specific, funny code review in seconds — scored across Code Quality, Documentation, Security, Structure, and Portfolio Value. Plus a generated README you can actually use.

Try it: huggingface.co/spaces/build-small-hackathon/roast-my-repo

How it works

A run has three steps.

Fetch. The GitHub REST API pulls repo metadata, the full file tree, and up to 12 key files in priority order — README, entry points, config files, dependency manifests. .env files are detected but never read.
Analyze. Two MiniCPM4-8B calls run in parallel — one returns structured JSON (roast, scorecard, red flags, hire score), the other returns a plain markdown README.
Classify. Before rendering, a classifier checks stars, contributors, and open issues. If any threshold fires, an amber warning banner explains that portfolio scoring doesn't apply to libraries or frameworks.

GitHub URL
    │
    ▼
Fetch metadata + file tree + up to 12 key files
    │
    ├── Call 1 (parallel): Structured JSON
    │     roast · scorecard · red_flags · hire_score · hire_verdict
    │
    └── Call 2 (parallel): Plain markdown README
    │
    ▼
Classify repo → render terminal UI

Why MiniCPM4-8B

The constraint was real: free-tier GPU on Modal, cold start budget under 2 minutes, model that fits on a single A10G (24GB VRAM). MiniCPM4-8B from OpenBMB checks all three — trained on 8 trillion tokens, fits comfortably in fp16, and produces chain-of-thought reasoning via <think> tags that actually matters for code review. A 70B model would need multi-GPU, balloon costs, and make cold starts unbearable. For reviewing 12 files at 3000 chars each, 8B is enough.

What broke

Timeouts on large repos

Large repos were hitting a 120-second timeout and crashing:

HTTPSConnectionPool: Read timed out. (read timeout=120)

Switching from a blocking POST to streaming via sseclient fixed it — tokens arrive as they're generated so the connection stays alive throughout. Timeout bumped to 300 seconds as a safety net.

The model hallucinating security flags

The model confidently flagged committed .env files on repos that didn't have one, and docked points for missing .env.example on projects that have no need for secrets.

Root cause: LLMs pattern-match on "junior dev repo" → "probably has .env issues" from training data. The fix was injecting verified facts at the top of the prompt before the context:

GROUND TRUTH:
- .ENV FILE COMMITTED: FALSE — only flag this if TRUE
- .ENV.EXAMPLE EXISTS: TRUE

Concrete facts placed before the context beat instructions placed after it.

Score anchoring

Every repo scored 5-6 regardless of quality. The culprit was the example JSON in the prompt — it had "score": 7, "score": 6 as placeholders and the model anchored to those numbers.

Replacing them with <INT 1-10> and adding explicit definitions fixed calibration:

- Score 1-2: missing basics, embarrassing to put on a portfolio
- Committed .env → security score is 1, no exceptions
- Fewer than 5 meaningful files → structure and portfolio_value are 1-2

Portfolio scoring on libraries

Running "Hire Me Score" on flask or requests is a nonsense metric. A classifier now runs before analysis with three thresholds:

LARGE_REPO_THRESHOLDS = {
    "stars":        500,
    "contributors": 10,
    "open_issues":  50,
}

If any fires, an amber warning banner appears explaining the scoring context. Contributor count is fetched with GitHub's Link header trick — one API call reads the last page number instead of paginating through all contributors.