Spaces:
Sleeping
Sleeping
Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -27,7 +27,8 @@ A RAG-powered chatbot that answers questions about UnitedHealthcare (UHC) medica
|
|
| 27 |
- *"Are intrapulmonary percussive ventilation devices covered for home use?"*
|
| 28 |
4. The chatbot will search relevant policy chunks, then stream an answer with citations.
|
| 29 |
5. Click **"π Sources"** below each answer to see the exact policy sections used.
|
| 30 |
-
6.
|
|
|
|
| 31 |
|
| 32 |
The chatbot only answers from official UHC policy documents β it will tell you if it doesn't have enough information rather than guessing.
|
| 33 |
|
|
@@ -60,9 +61,10 @@ The chatbot only answers from official UHC policy documents β it will tell you
|
|
| 60 |
1. User types a question in the Streamlit chat interface
|
| 61 |
2. The query is encoded into a 1024-dimensional vector using **MedEmbed** (loaded once, cached in memory)
|
| 62 |
3. The vector is sent to **Qdrant Cloud** for similarity search β returns top-K policy chunks with metadata
|
| 63 |
-
4. Retrieved chunks are deduplicated,
|
| 64 |
5. The context + query + system prompt are sent to **Groq API** (Llama 3.1 8B) for answer generation
|
| 65 |
6. The response is streamed token-by-token back to the user with source citations
|
|
|
|
| 66 |
|
| 67 |
### Low-Level Design (LLD)
|
| 68 |
|
|
@@ -80,6 +82,7 @@ uhc/
|
|
| 80 |
β βββ llm_groq.py # Groq API client (deployed)
|
| 81 |
β βββ llm.py # Ollama client (local dev)
|
| 82 |
β βββ prompts.py # System prompt, context formatting, deduplication
|
|
|
|
| 83 |
β βββ cli.py # CLI interface (local dev)
|
| 84 |
β
|
| 85 |
βββ embedding/ # Embedding pipeline
|
|
@@ -90,6 +93,9 @@ uhc/
|
|
| 90 |
β βββ search.py # Standalone search CLI for testing
|
| 91 |
β βββ test_retrieval.py # Batch retrieval evaluation (10 test cases)
|
| 92 |
β
|
|
|
|
|
|
|
|
|
|
| 93 |
βββ scraper/ # Data ingestion pipeline
|
| 94 |
βββ download_policies.py # Scrape PDFs from UHC website
|
| 95 |
βββ extract_pdf_text.py # PDF β structured sections with metadata
|
|
@@ -106,7 +112,7 @@ uhc/
|
|
| 106 |
- Connects to Qdrant Cloud; supports both cloud and local Qdrant
|
| 107 |
- Encodes queries β cosine similarity search β returns `ChunkResult` dataclasses
|
| 108 |
- Filters out low-value sections (References, Application) that pollute results
|
| 109 |
-
-
|
| 110 |
- Retry logic with exponential backoff for transient Qdrant errors
|
| 111 |
|
| 112 |
**`chatbot/prompts.py` β Prompt Engineering**
|
|
@@ -120,6 +126,12 @@ uhc/
|
|
| 120 |
- Graceful rate-limit handling (Groq free tier: 250K TPM)
|
| 121 |
- Same `chat_stream()` / `chat()` interface as the Ollama client for interchangeability
|
| 122 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 123 |
**`scraper/extract_pdf_text.py` β PDF Extraction**
|
| 124 |
- Paragraph-level extraction using `pdfplumber` (not line-by-line)
|
| 125 |
- Robust header/footer/sidebar removal with regex patterns
|
|
@@ -229,4 +241,5 @@ python -m chatbot.cli
|
|
| 229 |
| LLM (local dev) | Phi-3.5 Mini via Ollama |
|
| 230 |
| Web Framework | Streamlit |
|
| 231 |
| Hosting | HuggingFace Spaces (free tier) |
|
|
|
|
| 232 |
| PDF Extraction | pdfplumber + BeautifulSoup |
|
|
|
|
| 27 |
- *"Are intrapulmonary percussive ventilation devices covered for home use?"*
|
| 28 |
4. The chatbot will search relevant policy chunks, then stream an answer with citations.
|
| 29 |
5. Click **"π Sources"** below each answer to see the exact policy sections used.
|
| 30 |
+
6. Enable **"π Read answers aloud"** in the sidebar to hear answers via Kokoro TTS.
|
| 31 |
+
7. Use **"ποΈ Clear conversation"** in the sidebar to start a new session.
|
| 32 |
|
| 33 |
The chatbot only answers from official UHC policy documents β it will tell you if it doesn't have enough information rather than guessing.
|
| 34 |
|
|
|
|
| 61 |
1. User types a question in the Streamlit chat interface
|
| 62 |
2. The query is encoded into a 1024-dimensional vector using **MedEmbed** (loaded once, cached in memory)
|
| 63 |
3. The vector is sent to **Qdrant Cloud** for similarity search β returns top-K policy chunks with metadata
|
| 64 |
+
4. Retrieved chunks are deduplicated, scored with section priority boosts, and formatted into a context block
|
| 65 |
5. The context + query + system prompt are sent to **Groq API** (Llama 3.1 8B) for answer generation
|
| 66 |
6. The response is streamed token-by-token back to the user with source citations
|
| 67 |
+
7. If TTS is enabled, the response text is synthesized into audio using **Kokoro ONNX** and played in-browser
|
| 68 |
|
| 69 |
### Low-Level Design (LLD)
|
| 70 |
|
|
|
|
| 82 |
β βββ llm_groq.py # Groq API client (deployed)
|
| 83 |
β βββ llm.py # Ollama client (local dev)
|
| 84 |
β βββ prompts.py # System prompt, context formatting, deduplication
|
| 85 |
+
β βββ tts.py # Kokoro ONNX text-to-speech
|
| 86 |
β βββ cli.py # CLI interface (local dev)
|
| 87 |
β
|
| 88 |
βββ embedding/ # Embedding pipeline
|
|
|
|
| 93 |
β βββ search.py # Standalone search CLI for testing
|
| 94 |
β βββ test_retrieval.py # Batch retrieval evaluation (10 test cases)
|
| 95 |
β
|
| 96 |
+
βββ tests/ # Evaluation suite
|
| 97 |
+
β βββ eval_100.py # 100-prompt retrieval + LLM evaluation
|
| 98 |
+
β
|
| 99 |
βββ scraper/ # Data ingestion pipeline
|
| 100 |
βββ download_policies.py # Scrape PDFs from UHC website
|
| 101 |
βββ extract_pdf_text.py # PDF β structured sections with metadata
|
|
|
|
| 112 |
- Connects to Qdrant Cloud; supports both cloud and local Qdrant
|
| 113 |
- Encodes queries β cosine similarity search β returns `ChunkResult` dataclasses
|
| 114 |
- Filters out low-value sections (References, Application) that pollute results
|
| 115 |
+
- Section priority boosting (Coverage Rationale +0.04, Coverage Summary +0.03) so authoritative statements rank above clinical studies
|
| 116 |
- Retry logic with exponential backoff for transient Qdrant errors
|
| 117 |
|
| 118 |
**`chatbot/prompts.py` β Prompt Engineering**
|
|
|
|
| 126 |
- Graceful rate-limit handling (Groq free tier: 250K TPM)
|
| 127 |
- Same `chat_stream()` / `chat()` interface as the Ollama client for interchangeability
|
| 128 |
|
| 129 |
+
**`chatbot/tts.py` β Text-to-Speech**
|
| 130 |
+
- Uses [Kokoro ONNX](https://github.com/thewh1teagle/kokoro-onnx) (82M parameter model, ~300MB)
|
| 131 |
+
- Auto-downloads model files from HuggingFace Hub on first use
|
| 132 |
+
- Generates WAV audio from LLM response text, played in-browser via `st.audio`
|
| 133 |
+
- Toggleable via sidebar switch β disabled by default to save resources
|
| 134 |
+
|
| 135 |
**`scraper/extract_pdf_text.py` β PDF Extraction**
|
| 136 |
- Paragraph-level extraction using `pdfplumber` (not line-by-line)
|
| 137 |
- Robust header/footer/sidebar removal with regex patterns
|
|
|
|
| 241 |
| LLM (local dev) | Phi-3.5 Mini via Ollama |
|
| 242 |
| Web Framework | Streamlit |
|
| 243 |
| Hosting | HuggingFace Spaces (free tier) |
|
| 244 |
+
| Text-to-Speech | [Kokoro ONNX](https://github.com/thewh1teagle/kokoro-onnx) (82M) |
|
| 245 |
| PDF Extraction | pdfplumber + BeautifulSoup |
|