๐ฅ Roast My Repo
Paste a public GitHub URL into Roast My Repo and get a brutal, specific, funny code review in seconds โ scored across Code Quality, Documentation, Security, Structure, and Portfolio Value. Plus a generated README you can actually use.
Try it: huggingface.co/spaces/build-small-hackathon/roast-my-repo
How it works
A run has three steps.
- Fetch. The GitHub REST API pulls repo metadata, the full file tree, and up to 12 key files in priority order โ README, entry points, config files, dependency manifests.
.envfiles are detected but never read. - Analyze. Two MiniCPM4-8B calls run in parallel โ one returns structured JSON (roast, scorecard, red flags, hire score), the other returns a plain markdown README.
- Classify. Before rendering, a classifier checks stars, contributors, and open issues. If any threshold fires, an amber warning banner explains that portfolio scoring doesn't apply to libraries or frameworks.
GitHub URL
โ
โผ
Fetch metadata + file tree + up to 12 key files
โ
โโโ Call 1 (parallel): Structured JSON
โ roast ยท scorecard ยท red_flags ยท hire_score ยท hire_verdict
โ
โโโ Call 2 (parallel): Plain markdown README
โ
โผ
Classify repo โ render terminal UI
Why MiniCPM4-8B
The constraint was real: free-tier GPU on Modal, cold start budget under 2 minutes, model that fits on a single A10G (24GB VRAM). MiniCPM4-8B from OpenBMB checks all three โ trained on 8 trillion tokens, fits comfortably in fp16, and produces chain-of-thought reasoning via <think> tags that actually matters for code review. A 70B model would need multi-GPU, balloon costs, and make cold starts unbearable. For reviewing 12 files at 3000 chars each, 8B is enough.
What broke
Timeouts on large repos
Large repos were hitting a 120-second timeout and crashing:
HTTPSConnectionPool: Read timed out. (read timeout=120)
Switching from a blocking POST to streaming via sseclient fixed it โ tokens arrive as they're generated so the connection stays alive throughout. Timeout bumped to 300 seconds as a safety net.
The model hallucinating security flags
The model confidently flagged committed .env files on repos that didn't have one, and docked points for missing .env.example on projects that have no need for secrets.
Root cause: LLMs pattern-match on "junior dev repo" โ "probably has .env issues" from training data. The fix was injecting verified facts at the top of the prompt before the context:
GROUND TRUTH:
- .ENV FILE COMMITTED: FALSE โ only flag this if TRUE
- .ENV.EXAMPLE EXISTS: TRUE
Concrete facts placed before the context beat instructions placed after it.
Score anchoring
Every repo scored 5-6 regardless of quality. The culprit was the example JSON in the prompt โ it had "score": 7, "score": 6 as placeholders and the model anchored to those numbers.
Replacing them with <INT 1-10> and adding explicit definitions fixed calibration:
- Score 1-2: missing basics, embarrassing to put on a portfolio
- Committed .env โ security score is 1, no exceptions
- Fewer than 5 meaningful files โ structure and portfolio_value are 1-2
Portfolio scoring on libraries
Running "Hire Me Score" on flask or requests is a nonsense metric. A classifier now runs before analysis with three thresholds:
LARGE_REPO_THRESHOLDS = {
"stars": 500,
"contributors": 10,
"open_issues": 50,
}
If any fires, an amber warning banner appears explaining the scoring context. Contributor count is fetched with GitHub's Link header trick โ one API call reads the last page number instead of paginating through all contributors.
Try it
๐ฅ huggingface.co/spaces/build-small-hackathon/roast-my-repo
Paste any public GitHub URL. Brace yourself.
Built with MiniCPM4-8B by OpenBMB, served on Modal.
