Spaces:

mxp1404
/

uhc-policy-chatbot

Sleeping

App Files Files Community

mxp1404 commited on about 1 month ago

Commit

abdbe55

verified ·

1 Parent(s): a9c0333

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +16 -3

README.md CHANGED Viewed

@@ -27,7 +27,8 @@ A RAG-powered chatbot that answers questions about UnitedHealthcare (UHC) medica
    - *"Are intrapulmonary percussive ventilation devices covered for home use?"*
 4. The chatbot will search relevant policy chunks, then stream an answer with citations.
 5. Click **"📚 Sources"** below each answer to see the exact policy sections used.
-6. Use **"🗑️ Clear conversation"** in the sidebar to start a new session.
 The chatbot only answers from official UHC policy documents — it will tell you if it doesn't have enough information rather than guessing.
@@ -60,9 +61,10 @@ The chatbot only answers from official UHC policy documents — it will tell you
 1. User types a question in the Streamlit chat interface
 2. The query is encoded into a 1024-dimensional vector using **MedEmbed** (loaded once, cached in memory)
 3. The vector is sent to **Qdrant Cloud** for similarity search — returns top-K policy chunks with metadata
-4. Retrieved chunks are deduplicated, truncated, and formatted into a context block
 5. The context + query + system prompt are sent to **Groq API** (Llama 3.1 8B) for answer generation
 6. The response is streamed token-by-token back to the user with source citations
 ### Low-Level Design (LLD)
@@ -80,6 +82,7 @@ uhc/
 │   ├── llm_groq.py                 # Groq API client (deployed)
 │   ├── llm.py                      # Ollama client (local dev)
 │   ├── prompts.py                  # System prompt, context formatting, deduplication
 │   └── cli.py                      # CLI interface (local dev)
 │
 ├── embedding/                      # Embedding pipeline
@@ -90,6 +93,9 @@ uhc/
 │       ├── search.py               # Standalone search CLI for testing
 │       └── test_retrieval.py       # Batch retrieval evaluation (10 test cases)
 │
 └── scraper/                        # Data ingestion pipeline
     ├── download_policies.py        # Scrape PDFs from UHC website
     ├── extract_pdf_text.py         # PDF → structured sections with metadata
@@ -106,7 +112,7 @@ uhc/
 - Connects to Qdrant Cloud; supports both cloud and local Qdrant
 - Encodes queries → cosine similarity search → returns `ChunkResult` dataclasses
 - Filters out low-value sections (References, Application) that pollute results
-- Boosts Coverage Rationale chunks (+0.02 score) so authoritative coverage statements always surface
 - Retry logic with exponential backoff for transient Qdrant errors
 **`chatbot/prompts.py` — Prompt Engineering**
@@ -120,6 +126,12 @@ uhc/
 - Graceful rate-limit handling (Groq free tier: 250K TPM)
 - Same `chat_stream()` / `chat()` interface as the Ollama client for interchangeability
 **`scraper/extract_pdf_text.py` — PDF Extraction**
 - Paragraph-level extraction using `pdfplumber` (not line-by-line)
 - Robust header/footer/sidebar removal with regex patterns
@@ -229,4 +241,5 @@ python -m chatbot.cli
 | LLM (local dev) | Phi-3.5 Mini via Ollama |
 | Web Framework | Streamlit |
 | Hosting | HuggingFace Spaces (free tier) |
 | PDF Extraction | pdfplumber + BeautifulSoup |

    - *"Are intrapulmonary percussive ventilation devices covered for home use?"*
 4. The chatbot will search relevant policy chunks, then stream an answer with citations.
 5. Click **"📚 Sources"** below each answer to see the exact policy sections used.
+6. Enable **"🔊 Read answers aloud"** in the sidebar to hear answers via Kokoro TTS.
+7. Use **"🗑️ Clear conversation"** in the sidebar to start a new session.
 The chatbot only answers from official UHC policy documents — it will tell you if it doesn't have enough information rather than guessing.
 1. User types a question in the Streamlit chat interface
 2. The query is encoded into a 1024-dimensional vector using **MedEmbed** (loaded once, cached in memory)
 3. The vector is sent to **Qdrant Cloud** for similarity search — returns top-K policy chunks with metadata
+4. Retrieved chunks are deduplicated, scored with section priority boosts, and formatted into a context block
 5. The context + query + system prompt are sent to **Groq API** (Llama 3.1 8B) for answer generation
 6. The response is streamed token-by-token back to the user with source citations
+7. If TTS is enabled, the response text is synthesized into audio using **Kokoro ONNX** and played in-browser
 ### Low-Level Design (LLD)
 │   ├── llm_groq.py                 # Groq API client (deployed)
 │   ├── llm.py                      # Ollama client (local dev)
 │   ├── prompts.py                  # System prompt, context formatting, deduplication
+│   ├── tts.py                      # Kokoro ONNX text-to-speech
 │   └── cli.py                      # CLI interface (local dev)
 │
 ├── embedding/                      # Embedding pipeline
 │       ├── search.py               # Standalone search CLI for testing
 │       └── test_retrieval.py       # Batch retrieval evaluation (10 test cases)
 │
+├── tests/                          # Evaluation suite
+│   └── eval_100.py                 # 100-prompt retrieval + LLM evaluation
+│
 └── scraper/                        # Data ingestion pipeline
     ├── download_policies.py        # Scrape PDFs from UHC website
     ├── extract_pdf_text.py         # PDF → structured sections with metadata
 - Connects to Qdrant Cloud; supports both cloud and local Qdrant
 - Encodes queries → cosine similarity search → returns `ChunkResult` dataclasses
 - Filters out low-value sections (References, Application) that pollute results
+- Section priority boosting (Coverage Rationale +0.04, Coverage Summary +0.03) so authoritative statements rank above clinical studies
 - Retry logic with exponential backoff for transient Qdrant errors
 **`chatbot/prompts.py` — Prompt Engineering**
 - Graceful rate-limit handling (Groq free tier: 250K TPM)
 - Same `chat_stream()` / `chat()` interface as the Ollama client for interchangeability
+**`chatbot/tts.py` — Text-to-Speech**
+- Uses [Kokoro ONNX](https://github.com/thewh1teagle/kokoro-onnx) (82M parameter model, ~300MB)
+- Auto-downloads model files from HuggingFace Hub on first use
+- Generates WAV audio from LLM response text, played in-browser via `st.audio`
+- Toggleable via sidebar switch — disabled by default to save resources
 **`scraper/extract_pdf_text.py` — PDF Extraction**
 - Paragraph-level extraction using `pdfplumber` (not line-by-line)
 - Robust header/footer/sidebar removal with regex patterns
 | LLM (local dev) | Phi-3.5 Mini via Ollama |
 | Web Framework | Streamlit |
 | Hosting | HuggingFace Spaces (free tier) |
+| Text-to-Speech | [Kokoro ONNX](https://github.com/thewh1teagle/kokoro-onnx) (82M) |
 | PDF Extraction | pdfplumber + BeautifulSoup |