---
title: UHC Medical Policy Chatbot
emoji: 🏥
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.44.1
app_file: app.py
pinned: false
---

# UHC Medical Policy Chatbot

A RAG-powered chatbot that answers questions about UnitedHealthcare (UHC) medical policies. Built for doctors, hospital staff, and insurance coordinators who need accurate, cited answers about coverage criteria, CPT/HCPCS codes, and medical necessity requirements.

## Hosted Chatbot

**URL:** [https://huggingface.co/spaces/mxp1404/uhc-policy-chatbot](https://huggingface.co/spaces/mxp1404/uhc-policy-chatbot)

### How to Use — Step-by-Step

1. Open the link above in your browser.
2. Wait for the model to load (first visit takes ~30 seconds for MedEmbed to initialize).
3. Type your question in the chat input at the bottom — for example:
   - *"Is bariatric surgery covered for BMI over 40?"*
   - *"What documentation is needed for gender-affirming surgery?"*
   - *"Are intrapulmonary percussive ventilation devices covered for home use?"*
4. The chatbot will search relevant policy chunks, then stream an answer with citations.
5. Click **"📚 Sources"** below each answer to see the exact policy sections used.
6. Enable **"🔊 Read answers aloud"** in the sidebar to hear answers via Kokoro TTS.
7. Use **"🗑️ Clear conversation"** in the sidebar to start a new session.

The chatbot only answers from official UHC policy documents — it will tell you if it doesn't have enough information rather than guessing.

---

## Architecture

### High-Level Design (HLD)

```
┌─────────────┐     ┌──────────────────────────────────────────────┐
│   Browser    │────▶│  Streamlit App (HuggingFace Spaces)          │
│   (User)     │◀────│                                              │
└─────────────┘     │  ┌─────────────┐    ┌─────────────────────┐  │
                    │  │ MedEmbed    │    │ Groq API            │  │
                    │  │ (1024-dim)  │    │ Llama 3.1 8B        │  │
                    │  │ cached RAM  │    │ 560 tok/s           │  │
                    │  └──────┬──────┘    └──────▲──────────────┘  │
                    │         │                   │                  │
                    │         ▼                   │                  │
                    │  ┌─────────────┐   context + query            │
                    │  │ Qdrant Cloud│────────────┘                 │
                    │  │ (vectors)   │                              │
                    │  └─────────────┘                              │
                    └──────────────────────────────────────────────┘
```

**Data flow for each query:**

1. User types a question in the Streamlit chat interface
2. The query is encoded into a 1024-dimensional vector using **MedEmbed** (loaded once, cached in memory)
3. The vector is sent to **Qdrant Cloud** for similarity search — returns top-K policy chunks with metadata
4. Retrieved chunks are deduplicated, scored with section priority boosts, and formatted into a context block
5. The context + query + system prompt are sent to **Groq API** (Llama 3.1 8B) for answer generation
6. The response is streamed token-by-token back to the user with source citations
7. If TTS is enabled, the response text is synthesized into audio using **Kokoro ONNX** and played in-browser

### Low-Level Design (LLD)

#### Project Structure

```
uhc/
├── app.py                          # Streamlit web UI entry point
├── requirements.txt                # Python dependencies
├── .env.example                    # Environment variable template
│
├── chatbot/                        # Chatbot application layer
│   ├── config.py                   # Centralized config (LLM, retrieval, env vars)
│   ├── retriever.py                # PolicyRetriever: MedEmbed + Qdrant wrapper
│   ├── llm_groq.py                 # Groq API client (deployed)
│   ├── llm.py                      # Ollama client (local dev)
│   ├── prompts.py                  # System prompt, context formatting, deduplication
│   ├── tts.py                      # Kokoro ONNX text-to-speech
│   └── cli.py                      # CLI interface (local dev)
│
├── embedding/                      # Embedding pipeline
│   └── scripts/
│       ├── config.py               # Embedding model + Qdrant connection config
│       ├── embed_chunks.py         # Generate embeddings from RAG chunks
│       ├── store_qdrant.py         # Upsert embeddings into Qdrant with payload indexes
│       ├── search.py               # Standalone search CLI for testing
│       └── test_retrieval.py       # Batch retrieval evaluation (10 test cases)
│
├── tests/                          # Evaluation suite
│   └── eval_100.py                 # 100-prompt retrieval + LLM evaluation
│
└── scraper/                        # Data ingestion pipeline
    ├── download_policies.py        # Scrape PDFs from UHC website
    ├── extract_pdf_text.py         # PDF → structured sections with metadata
    ├── create_rag_chunks.py        # Section-aware semantic chunking
    └── data/processed/
        ├── extracted_sections.json # Extracted text per policy/section
        └── rag_chunks.json         # Final RAG chunks with metadata
```

#### Module Design

**`chatbot/retriever.py` — PolicyRetriever**
- Loads `abhinand/MedEmbed-large-v0.1` (1024-dim medical embeddings) via `sentence-transformers`
- Connects to Qdrant Cloud; supports both cloud and local Qdrant
- Encodes queries → cosine similarity search → returns `ChunkResult` dataclasses
- Filters out low-value sections (References, Application) that pollute results
- Section priority boosting (Coverage Rationale +0.04, Coverage Summary +0.03) so authoritative statements rank above clinical studies
- Retry logic with exponential backoff for transient Qdrant errors

**`chatbot/prompts.py` — Prompt Engineering**
- System prompt enforces: answer from context only, 2–4 bullet points, cite sources, coverage-awareness
- `deduplicate_chunks()` keeps highest-scoring chunk per (policy, section) pair
- `format_context()` truncates each chunk to 800 chars at sentence boundaries, caps total at 6000 chars
- Coverage Rationale is explicitly marked as authoritative for coverage decisions

**`chatbot/llm_groq.py` — GroqClient**
- Uses `groq` Python SDK with streaming chat completions
- Graceful rate-limit handling (Groq free tier: 250K TPM)
- Same `chat_stream()` / `chat()` interface as the Ollama client for interchangeability

**`chatbot/tts.py` — Text-to-Speech**
- Uses [Kokoro ONNX](https://github.com/thewh1teagle/kokoro-onnx) (82M parameter model, ~300MB)
- Auto-downloads model files from HuggingFace Hub on first use
- Generates WAV audio from LLM response text, played in-browser via `st.audio`
- Toggleable via sidebar switch — disabled by default to save resources

**`scraper/extract_pdf_text.py` — PDF Extraction**
- Paragraph-level extraction using `pdfplumber` (not line-by-line)
- Robust header/footer/sidebar removal with regex patterns
- Structured metadata parsing: policy number, effective date, plan type, document type
- Table extraction support; skips boilerplate sections and HTML-disguised files

**`scraper/create_rag_chunks.py` — Semantic Chunking**
- Section-aware chunking: different strategies per section type
  - Coverage Rationale → criteria-based splitting
  - Applicable Codes → table-aware chunking
  - Clinical Evidence → study-based splitting
  - Others → paragraph-aware with sentence-boundary overlap
- Rich metadata per chunk: policy name, section, plan type, page range, provider
- Deterministic chunk IDs for deduplication during re-indexing

**`embedding/scripts/embed_chunks.py` — Embedding Generation**
- Prepends metadata to chunk text before encoding for better retrieval
- Batch processing (32 chunks at a time) with GPU/MPS/CPU auto-detection
- Saves to `.npz` for efficient storage and reloading

**`embedding/scripts/store_qdrant.py` — Vector Storage**
- Creates Qdrant collection with cosine distance
- Upserts embeddings with full metadata payloads
- Creates payload indexes on `section`, `policy_name`, `plan_type`, `doc_type`, `provider` for efficient filtered search

#### Edge Cases Handled

| Edge Case | Handling |
|---|---|
| Empty / whitespace query | Warning message, no API call |
| Qdrant connection failure | Retry with exponential backoff (3 attempts) |
| Groq rate limit (429) | Caught and shown as user-friendly message |
| No relevant chunks found | "I don't have enough policy information" |
| Coverage vs. evidence conflict | System prompt + Coverage Rationale boost ensures correct answer |
| Very long conversation | History trimmed to last 3 turns |
| Model loading on first visit | Spinner shown; cached with `st.cache_resource` |

---

## Extending for Other Insurance Providers

The system is designed for multi-provider extensibility:

1. **Data layer**: Each chunk in Qdrant has a `provider` field (currently `"UnitedHealthcare"`). Adding a new provider means running the same pipeline with a new provider slug — chunks coexist in the same collection.

2. **Scraper**: `scraper/download_policies.py` can be adapted for any provider's website. The extractor and chunker handle standard medical policy PDF structures.

3. **Embedding**: The same MedEmbed model works for all medical content. New provider chunks are embedded and upserted alongside existing ones.

4. **Retrieval**: Add a `provider_filter` parameter to narrow results by provider, or query across all providers simultaneously.

5. **UI**: Add a provider selector dropdown in the Streamlit sidebar — one line change.

```python
# Example: adding Aetna
retriever.retrieve(query, provider_filter="aetna")
```

---

## Local Development Setup

```bash
# 1. Clone the repo
git clone https://github.com/<your-username>/uhc-policy-chatbot.git
cd uhc-policy-chatbot

# 2. Create virtual environment
python3 -m venv venv
source venv/bin/activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure environment variables
cp .env.example .env
# Edit .env with your Qdrant and Groq API keys

# 5. Run the Streamlit app
streamlit run app.py

# Or use the CLI with Ollama (local LLM)
ollama serve &
ollama pull phi3.5
python -m chatbot.cli
```

### Environment Variables

| Variable | Description | Required |
|---|---|---|
| `QDRANT_URL` | Qdrant Cloud cluster URL | Yes |
| `QDRANT_API_KEY` | Qdrant Cloud API key | Yes |
| `QDRANT_COLLECTION` | Collection name (default: `uhc_policies`) | No |
| `GROQ_API_KEY` | Groq API key ([get free](https://console.groq.com/keys)) | Yes (web) |
| `GROQ_MODEL` | Groq model (default: `llama-3.1-8b-instant`) | No |

---

## Tech Stack

| Component | Technology |
|---|---|
| Embedding Model | [MedEmbed-large-v0.1](https://huggingface.co/abhinand/MedEmbed-large-v0.1) (1024-dim) |
| Vector Database | [Qdrant Cloud](https://qdrant.tech/) |
| LLM (deployed) | [Llama 3.1 8B](https://console.groq.com/) via Groq (560 tok/s) |
| LLM (local dev) | Phi-3.5 Mini via Ollama |
| Web Framework | Streamlit |
| Hosting | HuggingFace Spaces (free tier) |
| Text-to-Speech | [Kokoro ONNX](https://github.com/thewh1teagle/kokoro-onnx) (82M) |
| PDF Extraction | pdfplumber + BeautifulSoup |