--- title: Search RL Environment sdk: docker pinned: false app_port: 8000 base_path: /web tags: - openenv --- # Search RL Environment RL environment for multi-hop document retrieval with explicit context management. Uses F-beta reward curriculum from the Context-1 paper. ## What it does Agent searches a corpus, reads documents into a limited context window, prunes irrelevant content, and submits an answer. Reward is based on finding the right documents and answering correctly. **Actions:** `search`, `read`, `prune`, `answer` **Live demo:** https://aman045-openenv-search-rl.hf.space ## Quick test ```bash curl -X POST https://aman045-openenv-search-rl.hf.space/reset curl -X POST https://aman045-openenv-search-rl.hf.space/step \ -H "Content-Type: application/json" \ -d '{"action":{"action_type":"search","search":{"query":"Instagram","top_k":3}}}' ``` ## Usage ```python from searcharena import SearchEnv, SearchAction env = SearchEnv.from_docker_image("search_env:latest") obs = env.reset() print(obs.question) # search -> read -> answer obs = env.step(SearchAction.make_search("Facebook acquisition Instagram")) results = obs.action_result["results"] obs = env.step(SearchAction.make_read([results[0]["chunk_id"]])) obs = env.step(SearchAction.make_answer("2012")) print(f"Reward: {obs.reward}") env.close() ``` ## Reward | Component | Weight | What it measures | |-----------|--------|------------------| | F-beta | 0.7 | Gold chunks in final context (β=4, recall-heavy) | | Trajectory | 0.3 | Gold chunks seen at any point | | Answer bonus | 1.0 | Correct answer with evidence | Penalties for excessive steps and repeated pruning. ## Tasks Three difficulty levels: - **Easy**: Single fact lookup - **Medium**: Two-hop reasoning - **Hard**: Multi-constraint, cross-document ## Run locally ```bash uv sync uvicorn server.app:app --reload ``` Or with Docker: ```bash docker build -t search_env . docker run -p 8000:8000 search_env ``` ## Project structure ``` searcharena/ # Core package engine/ # Environment logic training/ # Training utilities models.py # Pydantic models server/ # FastAPI wrapper data/ # Documents and tasks inference.py # Baseline agent ``` ## Config ```python SearchEnvConfig( max_steps=20, max_context_tokens=32768, beta=4.0, f_beta_weight=0.7, trajectory_reward_weight=0.3, ) ``` ## API - `POST /reset` - New episode - `POST /step` - Execute action - `GET /health` - Health check - `WS /ws` - WebSocket