Spaces:
Sleeping
Sleeping
metadata
title: Search RL Environment
sdk: docker
pinned: false
app_port: 8000
base_path: /web
tags:
- openenv
Search RL Environment
RL environment for multi-hop document retrieval with explicit context management. Uses F-beta reward curriculum from the Context-1 paper.
What it does
Agent searches a corpus, reads documents into a limited context window, prunes irrelevant content, and submits an answer. Reward is based on finding the right documents and answering correctly.
Actions: search, read, prune, answer
Live demo: https://aman045-openenv-search-rl.hf.space
Quick test
curl -X POST https://aman045-openenv-search-rl.hf.space/reset
curl -X POST https://aman045-openenv-search-rl.hf.space/step \
-H "Content-Type: application/json" \
-d '{"action":{"action_type":"search","search":{"query":"Instagram","top_k":3}}}'
Usage
from searcharena import SearchEnv, SearchAction
env = SearchEnv.from_docker_image("search_env:latest")
obs = env.reset()
print(obs.question)
# search -> read -> answer
obs = env.step(SearchAction.make_search("Facebook acquisition Instagram"))
results = obs.action_result["results"]
obs = env.step(SearchAction.make_read([results[0]["chunk_id"]]))
obs = env.step(SearchAction.make_answer("2012"))
print(f"Reward: {obs.reward}")
env.close()
Reward
| Component | Weight | What it measures |
|---|---|---|
| F-beta | 0.7 | Gold chunks in final context (β=4, recall-heavy) |
| Trajectory | 0.3 | Gold chunks seen at any point |
| Answer bonus | 1.0 | Correct answer with evidence |
Penalties for excessive steps and repeated pruning.
Tasks
Three difficulty levels:
- Easy: Single fact lookup
- Medium: Two-hop reasoning
- Hard: Multi-constraint, cross-document
Run locally
uv sync
uvicorn server.app:app --reload
Or with Docker:
docker build -t search_env .
docker run -p 8000:8000 search_env
Project structure
searcharena/ # Core package
engine/ # Environment logic
training/ # Training utilities
models.py # Pydantic models
server/ # FastAPI wrapper
data/ # Documents and tasks
inference.py # Baseline agent
Config
SearchEnvConfig(
max_steps=20,
max_context_tokens=32768,
beta=4.0,
f_beta_weight=0.7,
trajectory_reward_weight=0.3,
)
API
POST /reset- New episodePOST /step- Execute actionGET /health- Health checkWS /ws- WebSocket