Spaces:
Sleeping
A newer version of the Streamlit SDK is available: 1.56.0
title: UHC Medical Policy Chatbot
emoji: π₯
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.44.1
app_file: app.py
pinned: false
UHC Medical Policy Chatbot
A RAG-powered chatbot that answers questions about UnitedHealthcare (UHC) medical policies. Built for doctors, hospital staff, and insurance coordinators who need accurate, cited answers about coverage criteria, CPT/HCPCS codes, and medical necessity requirements.
Hosted Chatbot
URL: https://huggingface.co/spaces/mxp1404/uhc-policy-chatbot
How to Use β Step-by-Step
- Open the link above in your browser.
- Wait for the model to load (first visit takes ~30 seconds for MedEmbed to initialize).
- Type your question in the chat input at the bottom β for example:
- "Is bariatric surgery covered for BMI over 40?"
- "What documentation is needed for gender-affirming surgery?"
- "Are intrapulmonary percussive ventilation devices covered for home use?"
- The chatbot will search relevant policy chunks, then stream an answer with citations.
- Click "π Sources" below each answer to see the exact policy sections used.
- Enable "π Read answers aloud" in the sidebar to hear answers via Kokoro TTS.
- Use "ποΈ Clear conversation" in the sidebar to start a new session.
The chatbot only answers from official UHC policy documents β it will tell you if it doesn't have enough information rather than guessing.
Architecture
High-Level Design (HLD)
βββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββββββββ
β Browser ββββββΆβ Streamlit App (HuggingFace Spaces) β
β (User) βββββββ β
βββββββββββββββ β βββββββββββββββ βββββββββββββββββββββββ β
β β MedEmbed β β Groq API β β
β β (1024-dim) β β Llama 3.1 8B β β
β β cached RAM β β 560 tok/s β β
β ββββββββ¬βββββββ ββββββββ²βββββββββββββββ β
β β β β
β βΌ β β
β βββββββββββββββ context + query β
β β Qdrant Cloudββββββββββββββ β
β β (vectors) β β
β βββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββ
Data flow for each query:
- User types a question in the Streamlit chat interface
- The query is encoded into a 1024-dimensional vector using MedEmbed (loaded once, cached in memory)
- The vector is sent to Qdrant Cloud for similarity search β returns top-K policy chunks with metadata
- Retrieved chunks are deduplicated, scored with section priority boosts, and formatted into a context block
- The context + query + system prompt are sent to Groq API (Llama 3.1 8B) for answer generation
- The response is streamed token-by-token back to the user with source citations
- If TTS is enabled, the response text is synthesized into audio using Kokoro ONNX and played in-browser
Low-Level Design (LLD)
Project Structure
uhc/
βββ app.py # Streamlit web UI entry point
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variable template
β
βββ chatbot/ # Chatbot application layer
β βββ config.py # Centralized config (LLM, retrieval, env vars)
β βββ retriever.py # PolicyRetriever: MedEmbed + Qdrant wrapper
β βββ llm_groq.py # Groq API client (deployed)
β βββ llm.py # Ollama client (local dev)
β βββ prompts.py # System prompt, context formatting, deduplication
β βββ tts.py # Kokoro ONNX text-to-speech
β βββ cli.py # CLI interface (local dev)
β
βββ embedding/ # Embedding pipeline
β βββ scripts/
β βββ config.py # Embedding model + Qdrant connection config
β βββ embed_chunks.py # Generate embeddings from RAG chunks
β βββ store_qdrant.py # Upsert embeddings into Qdrant with payload indexes
β βββ search.py # Standalone search CLI for testing
β βββ test_retrieval.py # Batch retrieval evaluation (10 test cases)
β
βββ tests/ # Evaluation suite
β βββ eval_100.py # 100-prompt retrieval + LLM evaluation
β
βββ scraper/ # Data ingestion pipeline
βββ download_policies.py # Scrape PDFs from UHC website
βββ extract_pdf_text.py # PDF β structured sections with metadata
βββ create_rag_chunks.py # Section-aware semantic chunking
βββ data/processed/
βββ extracted_sections.json # Extracted text per policy/section
βββ rag_chunks.json # Final RAG chunks with metadata
Module Design
chatbot/retriever.py β PolicyRetriever
- Loads
abhinand/MedEmbed-large-v0.1(1024-dim medical embeddings) viasentence-transformers - Connects to Qdrant Cloud; supports both cloud and local Qdrant
- Encodes queries β cosine similarity search β returns
ChunkResultdataclasses - Filters out low-value sections (References, Application) that pollute results
- Section priority boosting (Coverage Rationale +0.04, Coverage Summary +0.03) so authoritative statements rank above clinical studies
- Retry logic with exponential backoff for transient Qdrant errors
chatbot/prompts.py β Prompt Engineering
- System prompt enforces: answer from context only, 2β4 bullet points, cite sources, coverage-awareness
deduplicate_chunks()keeps highest-scoring chunk per (policy, section) pairformat_context()truncates each chunk to 800 chars at sentence boundaries, caps total at 6000 chars- Coverage Rationale is explicitly marked as authoritative for coverage decisions
chatbot/llm_groq.py β GroqClient
- Uses
groqPython SDK with streaming chat completions - Graceful rate-limit handling (Groq free tier: 250K TPM)
- Same
chat_stream()/chat()interface as the Ollama client for interchangeability
chatbot/tts.py β Text-to-Speech
- Uses Kokoro ONNX (82M parameter model, ~300MB)
- Auto-downloads model files from HuggingFace Hub on first use
- Generates WAV audio from LLM response text, played in-browser via
st.audio - Toggleable via sidebar switch β disabled by default to save resources
scraper/extract_pdf_text.py β PDF Extraction
- Paragraph-level extraction using
pdfplumber(not line-by-line) - Robust header/footer/sidebar removal with regex patterns
- Structured metadata parsing: policy number, effective date, plan type, document type
- Table extraction support; skips boilerplate sections and HTML-disguised files
scraper/create_rag_chunks.py β Semantic Chunking
- Section-aware chunking: different strategies per section type
- Coverage Rationale β criteria-based splitting
- Applicable Codes β table-aware chunking
- Clinical Evidence β study-based splitting
- Others β paragraph-aware with sentence-boundary overlap
- Rich metadata per chunk: policy name, section, plan type, page range, provider
- Deterministic chunk IDs for deduplication during re-indexing
embedding/scripts/embed_chunks.py β Embedding Generation
- Prepends metadata to chunk text before encoding for better retrieval
- Batch processing (32 chunks at a time) with GPU/MPS/CPU auto-detection
- Saves to
.npzfor efficient storage and reloading
embedding/scripts/store_qdrant.py β Vector Storage
- Creates Qdrant collection with cosine distance
- Upserts embeddings with full metadata payloads
- Creates payload indexes on
section,policy_name,plan_type,doc_type,providerfor efficient filtered search
Edge Cases Handled
| Edge Case | Handling |
|---|---|
| Empty / whitespace query | Warning message, no API call |
| Qdrant connection failure | Retry with exponential backoff (3 attempts) |
| Groq rate limit (429) | Caught and shown as user-friendly message |
| No relevant chunks found | "I don't have enough policy information" |
| Coverage vs. evidence conflict | System prompt + Coverage Rationale boost ensures correct answer |
| Very long conversation | History trimmed to last 3 turns |
| Model loading on first visit | Spinner shown; cached with st.cache_resource |
Extending for Other Insurance Providers
The system is designed for multi-provider extensibility:
Data layer: Each chunk in Qdrant has a
providerfield (currently"UnitedHealthcare"). Adding a new provider means running the same pipeline with a new provider slug β chunks coexist in the same collection.Scraper:
scraper/download_policies.pycan be adapted for any provider's website. The extractor and chunker handle standard medical policy PDF structures.Embedding: The same MedEmbed model works for all medical content. New provider chunks are embedded and upserted alongside existing ones.
Retrieval: Add a
provider_filterparameter to narrow results by provider, or query across all providers simultaneously.UI: Add a provider selector dropdown in the Streamlit sidebar β one line change.
# Example: adding Aetna
retriever.retrieve(query, provider_filter="aetna")
Local Development Setup
# 1. Clone the repo
git clone https://github.com/<your-username>/uhc-policy-chatbot.git
cd uhc-policy-chatbot
# 2. Create virtual environment
python3 -m venv venv
source venv/bin/activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Configure environment variables
cp .env.example .env
# Edit .env with your Qdrant and Groq API keys
# 5. Run the Streamlit app
streamlit run app.py
# Or use the CLI with Ollama (local LLM)
ollama serve &
ollama pull phi3.5
python -m chatbot.cli
Environment Variables
| Variable | Description | Required |
|---|---|---|
QDRANT_URL |
Qdrant Cloud cluster URL | Yes |
QDRANT_API_KEY |
Qdrant Cloud API key | Yes |
QDRANT_COLLECTION |
Collection name (default: uhc_policies) |
No |
GROQ_API_KEY |
Groq API key (get free) | Yes (web) |
GROQ_MODEL |
Groq model (default: llama-3.1-8b-instant) |
No |
Tech Stack
| Component | Technology |
|---|---|
| Embedding Model | MedEmbed-large-v0.1 (1024-dim) |
| Vector Database | Qdrant Cloud |
| LLM (deployed) | Llama 3.1 8B via Groq (560 tok/s) |
| LLM (local dev) | Phi-3.5 Mini via Ollama |
| Web Framework | Streamlit |
| Hosting | HuggingFace Spaces (free tier) |
| Text-to-Speech | Kokoro ONNX (82M) |
| PDF Extraction | pdfplumber + BeautifulSoup |