uhc-policy-chatbot / README.md
mxp1404's picture
Upload README.md with huggingface_hub
abdbe55 verified

A newer version of the Streamlit SDK is available: 1.56.0

Upgrade
metadata
title: UHC Medical Policy Chatbot
emoji: πŸ₯
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.44.1
app_file: app.py
pinned: false

UHC Medical Policy Chatbot

A RAG-powered chatbot that answers questions about UnitedHealthcare (UHC) medical policies. Built for doctors, hospital staff, and insurance coordinators who need accurate, cited answers about coverage criteria, CPT/HCPCS codes, and medical necessity requirements.

Hosted Chatbot

URL: https://huggingface.co/spaces/mxp1404/uhc-policy-chatbot

How to Use β€” Step-by-Step

  1. Open the link above in your browser.
  2. Wait for the model to load (first visit takes ~30 seconds for MedEmbed to initialize).
  3. Type your question in the chat input at the bottom β€” for example:
    • "Is bariatric surgery covered for BMI over 40?"
    • "What documentation is needed for gender-affirming surgery?"
    • "Are intrapulmonary percussive ventilation devices covered for home use?"
  4. The chatbot will search relevant policy chunks, then stream an answer with citations.
  5. Click "πŸ“š Sources" below each answer to see the exact policy sections used.
  6. Enable "πŸ”Š Read answers aloud" in the sidebar to hear answers via Kokoro TTS.
  7. Use "πŸ—‘οΈ Clear conversation" in the sidebar to start a new session.

The chatbot only answers from official UHC policy documents β€” it will tell you if it doesn't have enough information rather than guessing.


Architecture

High-Level Design (HLD)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Browser    │────▢│  Streamlit App (HuggingFace Spaces)          β”‚
β”‚   (User)     │◀────│                                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
                    β”‚  β”‚ MedEmbed    β”‚    β”‚ Groq API            β”‚  β”‚
                    β”‚  β”‚ (1024-dim)  β”‚    β”‚ Llama 3.1 8B        β”‚  β”‚
                    β”‚  β”‚ cached RAM  β”‚    β”‚ 560 tok/s           β”‚  β”‚
                    β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
                    β”‚         β”‚                   β”‚                  β”‚
                    β”‚         β–Ό                   β”‚                  β”‚
                    β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   context + query            β”‚
                    β”‚  β”‚ Qdrant Cloudβ”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
                    β”‚  β”‚ (vectors)   β”‚                              β”‚
                    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                              β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data flow for each query:

  1. User types a question in the Streamlit chat interface
  2. The query is encoded into a 1024-dimensional vector using MedEmbed (loaded once, cached in memory)
  3. The vector is sent to Qdrant Cloud for similarity search β€” returns top-K policy chunks with metadata
  4. Retrieved chunks are deduplicated, scored with section priority boosts, and formatted into a context block
  5. The context + query + system prompt are sent to Groq API (Llama 3.1 8B) for answer generation
  6. The response is streamed token-by-token back to the user with source citations
  7. If TTS is enabled, the response text is synthesized into audio using Kokoro ONNX and played in-browser

Low-Level Design (LLD)

Project Structure

uhc/
β”œβ”€β”€ app.py                          # Streamlit web UI entry point
β”œβ”€β”€ requirements.txt                # Python dependencies
β”œβ”€β”€ .env.example                    # Environment variable template
β”‚
β”œβ”€β”€ chatbot/                        # Chatbot application layer
β”‚   β”œβ”€β”€ config.py                   # Centralized config (LLM, retrieval, env vars)
β”‚   β”œβ”€β”€ retriever.py                # PolicyRetriever: MedEmbed + Qdrant wrapper
β”‚   β”œβ”€β”€ llm_groq.py                 # Groq API client (deployed)
β”‚   β”œβ”€β”€ llm.py                      # Ollama client (local dev)
β”‚   β”œβ”€β”€ prompts.py                  # System prompt, context formatting, deduplication
β”‚   β”œβ”€β”€ tts.py                      # Kokoro ONNX text-to-speech
β”‚   └── cli.py                      # CLI interface (local dev)
β”‚
β”œβ”€β”€ embedding/                      # Embedding pipeline
β”‚   └── scripts/
β”‚       β”œβ”€β”€ config.py               # Embedding model + Qdrant connection config
β”‚       β”œβ”€β”€ embed_chunks.py         # Generate embeddings from RAG chunks
β”‚       β”œβ”€β”€ store_qdrant.py         # Upsert embeddings into Qdrant with payload indexes
β”‚       β”œβ”€β”€ search.py               # Standalone search CLI for testing
β”‚       └── test_retrieval.py       # Batch retrieval evaluation (10 test cases)
β”‚
β”œβ”€β”€ tests/                          # Evaluation suite
β”‚   └── eval_100.py                 # 100-prompt retrieval + LLM evaluation
β”‚
└── scraper/                        # Data ingestion pipeline
    β”œβ”€β”€ download_policies.py        # Scrape PDFs from UHC website
    β”œβ”€β”€ extract_pdf_text.py         # PDF β†’ structured sections with metadata
    β”œβ”€β”€ create_rag_chunks.py        # Section-aware semantic chunking
    └── data/processed/
        β”œβ”€β”€ extracted_sections.json # Extracted text per policy/section
        └── rag_chunks.json         # Final RAG chunks with metadata

Module Design

chatbot/retriever.py β€” PolicyRetriever

  • Loads abhinand/MedEmbed-large-v0.1 (1024-dim medical embeddings) via sentence-transformers
  • Connects to Qdrant Cloud; supports both cloud and local Qdrant
  • Encodes queries β†’ cosine similarity search β†’ returns ChunkResult dataclasses
  • Filters out low-value sections (References, Application) that pollute results
  • Section priority boosting (Coverage Rationale +0.04, Coverage Summary +0.03) so authoritative statements rank above clinical studies
  • Retry logic with exponential backoff for transient Qdrant errors

chatbot/prompts.py β€” Prompt Engineering

  • System prompt enforces: answer from context only, 2–4 bullet points, cite sources, coverage-awareness
  • deduplicate_chunks() keeps highest-scoring chunk per (policy, section) pair
  • format_context() truncates each chunk to 800 chars at sentence boundaries, caps total at 6000 chars
  • Coverage Rationale is explicitly marked as authoritative for coverage decisions

chatbot/llm_groq.py β€” GroqClient

  • Uses groq Python SDK with streaming chat completions
  • Graceful rate-limit handling (Groq free tier: 250K TPM)
  • Same chat_stream() / chat() interface as the Ollama client for interchangeability

chatbot/tts.py β€” Text-to-Speech

  • Uses Kokoro ONNX (82M parameter model, ~300MB)
  • Auto-downloads model files from HuggingFace Hub on first use
  • Generates WAV audio from LLM response text, played in-browser via st.audio
  • Toggleable via sidebar switch β€” disabled by default to save resources

scraper/extract_pdf_text.py β€” PDF Extraction

  • Paragraph-level extraction using pdfplumber (not line-by-line)
  • Robust header/footer/sidebar removal with regex patterns
  • Structured metadata parsing: policy number, effective date, plan type, document type
  • Table extraction support; skips boilerplate sections and HTML-disguised files

scraper/create_rag_chunks.py β€” Semantic Chunking

  • Section-aware chunking: different strategies per section type
    • Coverage Rationale β†’ criteria-based splitting
    • Applicable Codes β†’ table-aware chunking
    • Clinical Evidence β†’ study-based splitting
    • Others β†’ paragraph-aware with sentence-boundary overlap
  • Rich metadata per chunk: policy name, section, plan type, page range, provider
  • Deterministic chunk IDs for deduplication during re-indexing

embedding/scripts/embed_chunks.py β€” Embedding Generation

  • Prepends metadata to chunk text before encoding for better retrieval
  • Batch processing (32 chunks at a time) with GPU/MPS/CPU auto-detection
  • Saves to .npz for efficient storage and reloading

embedding/scripts/store_qdrant.py β€” Vector Storage

  • Creates Qdrant collection with cosine distance
  • Upserts embeddings with full metadata payloads
  • Creates payload indexes on section, policy_name, plan_type, doc_type, provider for efficient filtered search

Edge Cases Handled

Edge Case Handling
Empty / whitespace query Warning message, no API call
Qdrant connection failure Retry with exponential backoff (3 attempts)
Groq rate limit (429) Caught and shown as user-friendly message
No relevant chunks found "I don't have enough policy information"
Coverage vs. evidence conflict System prompt + Coverage Rationale boost ensures correct answer
Very long conversation History trimmed to last 3 turns
Model loading on first visit Spinner shown; cached with st.cache_resource

Extending for Other Insurance Providers

The system is designed for multi-provider extensibility:

  1. Data layer: Each chunk in Qdrant has a provider field (currently "UnitedHealthcare"). Adding a new provider means running the same pipeline with a new provider slug β€” chunks coexist in the same collection.

  2. Scraper: scraper/download_policies.py can be adapted for any provider's website. The extractor and chunker handle standard medical policy PDF structures.

  3. Embedding: The same MedEmbed model works for all medical content. New provider chunks are embedded and upserted alongside existing ones.

  4. Retrieval: Add a provider_filter parameter to narrow results by provider, or query across all providers simultaneously.

  5. UI: Add a provider selector dropdown in the Streamlit sidebar β€” one line change.

# Example: adding Aetna
retriever.retrieve(query, provider_filter="aetna")

Local Development Setup

# 1. Clone the repo
git clone https://github.com/<your-username>/uhc-policy-chatbot.git
cd uhc-policy-chatbot

# 2. Create virtual environment
python3 -m venv venv
source venv/bin/activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure environment variables
cp .env.example .env
# Edit .env with your Qdrant and Groq API keys

# 5. Run the Streamlit app
streamlit run app.py

# Or use the CLI with Ollama (local LLM)
ollama serve &
ollama pull phi3.5
python -m chatbot.cli

Environment Variables

Variable Description Required
QDRANT_URL Qdrant Cloud cluster URL Yes
QDRANT_API_KEY Qdrant Cloud API key Yes
QDRANT_COLLECTION Collection name (default: uhc_policies) No
GROQ_API_KEY Groq API key (get free) Yes (web)
GROQ_MODEL Groq model (default: llama-3.1-8b-instant) No

Tech Stack

Component Technology
Embedding Model MedEmbed-large-v0.1 (1024-dim)
Vector Database Qdrant Cloud
LLM (deployed) Llama 3.1 8B via Groq (560 tok/s)
LLM (local dev) Phi-3.5 Mini via Ollama
Web Framework Streamlit
Hosting HuggingFace Spaces (free tier)
Text-to-Speech Kokoro ONNX (82M)
PDF Extraction pdfplumber + BeautifulSoup