--- title: UHC Medical Policy Chatbot emoji: πŸ₯ colorFrom: blue colorTo: purple sdk: streamlit sdk_version: 1.44.1 app_file: app.py pinned: false --- # UHC Medical Policy Chatbot A RAG-powered chatbot that answers questions about UnitedHealthcare (UHC) medical policies. Built for doctors, hospital staff, and insurance coordinators who need accurate, cited answers about coverage criteria, CPT/HCPCS codes, and medical necessity requirements. ## Hosted Chatbot **URL:** [https://huggingface.co/spaces/mxp1404/uhc-policy-chatbot](https://huggingface.co/spaces/mxp1404/uhc-policy-chatbot) ### How to Use β€” Step-by-Step 1. Open the link above in your browser. 2. Wait for the model to load (first visit takes ~30 seconds for MedEmbed to initialize). 3. Type your question in the chat input at the bottom β€” for example: - *"Is bariatric surgery covered for BMI over 40?"* - *"What documentation is needed for gender-affirming surgery?"* - *"Are intrapulmonary percussive ventilation devices covered for home use?"* 4. The chatbot will search relevant policy chunks, then stream an answer with citations. 5. Click **"πŸ“š Sources"** below each answer to see the exact policy sections used. 6. Enable **"πŸ”Š Read answers aloud"** in the sidebar to hear answers via Kokoro TTS. 7. Use **"πŸ—‘οΈ Clear conversation"** in the sidebar to start a new session. The chatbot only answers from official UHC policy documents β€” it will tell you if it doesn't have enough information rather than guessing. --- ## Architecture ### High-Level Design (HLD) ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Browser │────▢│ Streamlit App (HuggingFace Spaces) β”‚ β”‚ (User) │◀────│ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ MedEmbed β”‚ β”‚ Groq API β”‚ β”‚ β”‚ β”‚ (1024-dim) β”‚ β”‚ Llama 3.1 8B β”‚ β”‚ β”‚ β”‚ cached RAM β”‚ β”‚ 560 tok/s β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β”‚ β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” context + query β”‚ β”‚ β”‚ Qdrant Cloudβ”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ (vectors) β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` **Data flow for each query:** 1. User types a question in the Streamlit chat interface 2. The query is encoded into a 1024-dimensional vector using **MedEmbed** (loaded once, cached in memory) 3. The vector is sent to **Qdrant Cloud** for similarity search β€” returns top-K policy chunks with metadata 4. Retrieved chunks are deduplicated, scored with section priority boosts, and formatted into a context block 5. The context + query + system prompt are sent to **Groq API** (Llama 3.1 8B) for answer generation 6. The response is streamed token-by-token back to the user with source citations 7. If TTS is enabled, the response text is synthesized into audio using **Kokoro ONNX** and played in-browser ### Low-Level Design (LLD) #### Project Structure ``` uhc/ β”œβ”€β”€ app.py # Streamlit web UI entry point β”œβ”€β”€ requirements.txt # Python dependencies β”œβ”€β”€ .env.example # Environment variable template β”‚ β”œβ”€β”€ chatbot/ # Chatbot application layer β”‚ β”œβ”€β”€ config.py # Centralized config (LLM, retrieval, env vars) β”‚ β”œβ”€β”€ retriever.py # PolicyRetriever: MedEmbed + Qdrant wrapper β”‚ β”œβ”€β”€ llm_groq.py # Groq API client (deployed) β”‚ β”œβ”€β”€ llm.py # Ollama client (local dev) β”‚ β”œβ”€β”€ prompts.py # System prompt, context formatting, deduplication β”‚ β”œβ”€β”€ tts.py # Kokoro ONNX text-to-speech β”‚ └── cli.py # CLI interface (local dev) β”‚ β”œβ”€β”€ embedding/ # Embedding pipeline β”‚ └── scripts/ β”‚ β”œβ”€β”€ config.py # Embedding model + Qdrant connection config β”‚ β”œβ”€β”€ embed_chunks.py # Generate embeddings from RAG chunks β”‚ β”œβ”€β”€ store_qdrant.py # Upsert embeddings into Qdrant with payload indexes β”‚ β”œβ”€β”€ search.py # Standalone search CLI for testing β”‚ └── test_retrieval.py # Batch retrieval evaluation (10 test cases) β”‚ β”œβ”€β”€ tests/ # Evaluation suite β”‚ └── eval_100.py # 100-prompt retrieval + LLM evaluation β”‚ └── scraper/ # Data ingestion pipeline β”œβ”€β”€ download_policies.py # Scrape PDFs from UHC website β”œβ”€β”€ extract_pdf_text.py # PDF β†’ structured sections with metadata β”œβ”€β”€ create_rag_chunks.py # Section-aware semantic chunking └── data/processed/ β”œβ”€β”€ extracted_sections.json # Extracted text per policy/section └── rag_chunks.json # Final RAG chunks with metadata ``` #### Module Design **`chatbot/retriever.py` β€” PolicyRetriever** - Loads `abhinand/MedEmbed-large-v0.1` (1024-dim medical embeddings) via `sentence-transformers` - Connects to Qdrant Cloud; supports both cloud and local Qdrant - Encodes queries β†’ cosine similarity search β†’ returns `ChunkResult` dataclasses - Filters out low-value sections (References, Application) that pollute results - Section priority boosting (Coverage Rationale +0.04, Coverage Summary +0.03) so authoritative statements rank above clinical studies - Retry logic with exponential backoff for transient Qdrant errors **`chatbot/prompts.py` β€” Prompt Engineering** - System prompt enforces: answer from context only, 2–4 bullet points, cite sources, coverage-awareness - `deduplicate_chunks()` keeps highest-scoring chunk per (policy, section) pair - `format_context()` truncates each chunk to 800 chars at sentence boundaries, caps total at 6000 chars - Coverage Rationale is explicitly marked as authoritative for coverage decisions **`chatbot/llm_groq.py` β€” GroqClient** - Uses `groq` Python SDK with streaming chat completions - Graceful rate-limit handling (Groq free tier: 250K TPM) - Same `chat_stream()` / `chat()` interface as the Ollama client for interchangeability **`chatbot/tts.py` β€” Text-to-Speech** - Uses [Kokoro ONNX](https://github.com/thewh1teagle/kokoro-onnx) (82M parameter model, ~300MB) - Auto-downloads model files from HuggingFace Hub on first use - Generates WAV audio from LLM response text, played in-browser via `st.audio` - Toggleable via sidebar switch β€” disabled by default to save resources **`scraper/extract_pdf_text.py` β€” PDF Extraction** - Paragraph-level extraction using `pdfplumber` (not line-by-line) - Robust header/footer/sidebar removal with regex patterns - Structured metadata parsing: policy number, effective date, plan type, document type - Table extraction support; skips boilerplate sections and HTML-disguised files **`scraper/create_rag_chunks.py` β€” Semantic Chunking** - Section-aware chunking: different strategies per section type - Coverage Rationale β†’ criteria-based splitting - Applicable Codes β†’ table-aware chunking - Clinical Evidence β†’ study-based splitting - Others β†’ paragraph-aware with sentence-boundary overlap - Rich metadata per chunk: policy name, section, plan type, page range, provider - Deterministic chunk IDs for deduplication during re-indexing **`embedding/scripts/embed_chunks.py` β€” Embedding Generation** - Prepends metadata to chunk text before encoding for better retrieval - Batch processing (32 chunks at a time) with GPU/MPS/CPU auto-detection - Saves to `.npz` for efficient storage and reloading **`embedding/scripts/store_qdrant.py` β€” Vector Storage** - Creates Qdrant collection with cosine distance - Upserts embeddings with full metadata payloads - Creates payload indexes on `section`, `policy_name`, `plan_type`, `doc_type`, `provider` for efficient filtered search #### Edge Cases Handled | Edge Case | Handling | |---|---| | Empty / whitespace query | Warning message, no API call | | Qdrant connection failure | Retry with exponential backoff (3 attempts) | | Groq rate limit (429) | Caught and shown as user-friendly message | | No relevant chunks found | "I don't have enough policy information" | | Coverage vs. evidence conflict | System prompt + Coverage Rationale boost ensures correct answer | | Very long conversation | History trimmed to last 3 turns | | Model loading on first visit | Spinner shown; cached with `st.cache_resource` | --- ## Extending for Other Insurance Providers The system is designed for multi-provider extensibility: 1. **Data layer**: Each chunk in Qdrant has a `provider` field (currently `"UnitedHealthcare"`). Adding a new provider means running the same pipeline with a new provider slug β€” chunks coexist in the same collection. 2. **Scraper**: `scraper/download_policies.py` can be adapted for any provider's website. The extractor and chunker handle standard medical policy PDF structures. 3. **Embedding**: The same MedEmbed model works for all medical content. New provider chunks are embedded and upserted alongside existing ones. 4. **Retrieval**: Add a `provider_filter` parameter to narrow results by provider, or query across all providers simultaneously. 5. **UI**: Add a provider selector dropdown in the Streamlit sidebar β€” one line change. ```python # Example: adding Aetna retriever.retrieve(query, provider_filter="aetna") ``` --- ## Local Development Setup ```bash # 1. Clone the repo git clone https://github.com//uhc-policy-chatbot.git cd uhc-policy-chatbot # 2. Create virtual environment python3 -m venv venv source venv/bin/activate # 3. Install dependencies pip install -r requirements.txt # 4. Configure environment variables cp .env.example .env # Edit .env with your Qdrant and Groq API keys # 5. Run the Streamlit app streamlit run app.py # Or use the CLI with Ollama (local LLM) ollama serve & ollama pull phi3.5 python -m chatbot.cli ``` ### Environment Variables | Variable | Description | Required | |---|---|---| | `QDRANT_URL` | Qdrant Cloud cluster URL | Yes | | `QDRANT_API_KEY` | Qdrant Cloud API key | Yes | | `QDRANT_COLLECTION` | Collection name (default: `uhc_policies`) | No | | `GROQ_API_KEY` | Groq API key ([get free](https://console.groq.com/keys)) | Yes (web) | | `GROQ_MODEL` | Groq model (default: `llama-3.1-8b-instant`) | No | --- ## Tech Stack | Component | Technology | |---|---| | Embedding Model | [MedEmbed-large-v0.1](https://huggingface.co/abhinand/MedEmbed-large-v0.1) (1024-dim) | | Vector Database | [Qdrant Cloud](https://qdrant.tech/) | | LLM (deployed) | [Llama 3.1 8B](https://console.groq.com/) via Groq (560 tok/s) | | LLM (local dev) | Phi-3.5 Mini via Ollama | | Web Framework | Streamlit | | Hosting | HuggingFace Spaces (free tier) | | Text-to-Speech | [Kokoro ONNX](https://github.com/thewh1teagle/kokoro-onnx) (82M) | | PDF Extraction | pdfplumber + BeautifulSoup |