mxp1404 commited on
Commit
abdbe55
Β·
verified Β·
1 Parent(s): a9c0333

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +16 -3
README.md CHANGED
@@ -27,7 +27,8 @@ A RAG-powered chatbot that answers questions about UnitedHealthcare (UHC) medica
27
  - *"Are intrapulmonary percussive ventilation devices covered for home use?"*
28
  4. The chatbot will search relevant policy chunks, then stream an answer with citations.
29
  5. Click **"πŸ“š Sources"** below each answer to see the exact policy sections used.
30
- 6. Use **"πŸ—‘οΈ Clear conversation"** in the sidebar to start a new session.
 
31
 
32
  The chatbot only answers from official UHC policy documents β€” it will tell you if it doesn't have enough information rather than guessing.
33
 
@@ -60,9 +61,10 @@ The chatbot only answers from official UHC policy documents β€” it will tell you
60
  1. User types a question in the Streamlit chat interface
61
  2. The query is encoded into a 1024-dimensional vector using **MedEmbed** (loaded once, cached in memory)
62
  3. The vector is sent to **Qdrant Cloud** for similarity search β€” returns top-K policy chunks with metadata
63
- 4. Retrieved chunks are deduplicated, truncated, and formatted into a context block
64
  5. The context + query + system prompt are sent to **Groq API** (Llama 3.1 8B) for answer generation
65
  6. The response is streamed token-by-token back to the user with source citations
 
66
 
67
  ### Low-Level Design (LLD)
68
 
@@ -80,6 +82,7 @@ uhc/
80
  β”‚ β”œβ”€β”€ llm_groq.py # Groq API client (deployed)
81
  β”‚ β”œβ”€β”€ llm.py # Ollama client (local dev)
82
  β”‚ β”œβ”€β”€ prompts.py # System prompt, context formatting, deduplication
 
83
  β”‚ └── cli.py # CLI interface (local dev)
84
  β”‚
85
  β”œβ”€β”€ embedding/ # Embedding pipeline
@@ -90,6 +93,9 @@ uhc/
90
  β”‚ β”œβ”€β”€ search.py # Standalone search CLI for testing
91
  β”‚ └── test_retrieval.py # Batch retrieval evaluation (10 test cases)
92
  β”‚
 
 
 
93
  └── scraper/ # Data ingestion pipeline
94
  β”œβ”€β”€ download_policies.py # Scrape PDFs from UHC website
95
  β”œβ”€β”€ extract_pdf_text.py # PDF β†’ structured sections with metadata
@@ -106,7 +112,7 @@ uhc/
106
  - Connects to Qdrant Cloud; supports both cloud and local Qdrant
107
  - Encodes queries β†’ cosine similarity search β†’ returns `ChunkResult` dataclasses
108
  - Filters out low-value sections (References, Application) that pollute results
109
- - Boosts Coverage Rationale chunks (+0.02 score) so authoritative coverage statements always surface
110
  - Retry logic with exponential backoff for transient Qdrant errors
111
 
112
  **`chatbot/prompts.py` β€” Prompt Engineering**
@@ -120,6 +126,12 @@ uhc/
120
  - Graceful rate-limit handling (Groq free tier: 250K TPM)
121
  - Same `chat_stream()` / `chat()` interface as the Ollama client for interchangeability
122
 
 
 
 
 
 
 
123
  **`scraper/extract_pdf_text.py` β€” PDF Extraction**
124
  - Paragraph-level extraction using `pdfplumber` (not line-by-line)
125
  - Robust header/footer/sidebar removal with regex patterns
@@ -229,4 +241,5 @@ python -m chatbot.cli
229
  | LLM (local dev) | Phi-3.5 Mini via Ollama |
230
  | Web Framework | Streamlit |
231
  | Hosting | HuggingFace Spaces (free tier) |
 
232
  | PDF Extraction | pdfplumber + BeautifulSoup |
 
27
  - *"Are intrapulmonary percussive ventilation devices covered for home use?"*
28
  4. The chatbot will search relevant policy chunks, then stream an answer with citations.
29
  5. Click **"πŸ“š Sources"** below each answer to see the exact policy sections used.
30
+ 6. Enable **"πŸ”Š Read answers aloud"** in the sidebar to hear answers via Kokoro TTS.
31
+ 7. Use **"πŸ—‘οΈ Clear conversation"** in the sidebar to start a new session.
32
 
33
  The chatbot only answers from official UHC policy documents β€” it will tell you if it doesn't have enough information rather than guessing.
34
 
 
61
  1. User types a question in the Streamlit chat interface
62
  2. The query is encoded into a 1024-dimensional vector using **MedEmbed** (loaded once, cached in memory)
63
  3. The vector is sent to **Qdrant Cloud** for similarity search β€” returns top-K policy chunks with metadata
64
+ 4. Retrieved chunks are deduplicated, scored with section priority boosts, and formatted into a context block
65
  5. The context + query + system prompt are sent to **Groq API** (Llama 3.1 8B) for answer generation
66
  6. The response is streamed token-by-token back to the user with source citations
67
+ 7. If TTS is enabled, the response text is synthesized into audio using **Kokoro ONNX** and played in-browser
68
 
69
  ### Low-Level Design (LLD)
70
 
 
82
  β”‚ β”œβ”€β”€ llm_groq.py # Groq API client (deployed)
83
  β”‚ β”œβ”€β”€ llm.py # Ollama client (local dev)
84
  β”‚ β”œβ”€β”€ prompts.py # System prompt, context formatting, deduplication
85
+ β”‚ β”œβ”€β”€ tts.py # Kokoro ONNX text-to-speech
86
  β”‚ └── cli.py # CLI interface (local dev)
87
  β”‚
88
  β”œβ”€β”€ embedding/ # Embedding pipeline
 
93
  β”‚ β”œβ”€β”€ search.py # Standalone search CLI for testing
94
  β”‚ └── test_retrieval.py # Batch retrieval evaluation (10 test cases)
95
  β”‚
96
+ β”œβ”€β”€ tests/ # Evaluation suite
97
+ β”‚ └── eval_100.py # 100-prompt retrieval + LLM evaluation
98
+ β”‚
99
  └── scraper/ # Data ingestion pipeline
100
  β”œβ”€β”€ download_policies.py # Scrape PDFs from UHC website
101
  β”œβ”€β”€ extract_pdf_text.py # PDF β†’ structured sections with metadata
 
112
  - Connects to Qdrant Cloud; supports both cloud and local Qdrant
113
  - Encodes queries β†’ cosine similarity search β†’ returns `ChunkResult` dataclasses
114
  - Filters out low-value sections (References, Application) that pollute results
115
+ - Section priority boosting (Coverage Rationale +0.04, Coverage Summary +0.03) so authoritative statements rank above clinical studies
116
  - Retry logic with exponential backoff for transient Qdrant errors
117
 
118
  **`chatbot/prompts.py` β€” Prompt Engineering**
 
126
  - Graceful rate-limit handling (Groq free tier: 250K TPM)
127
  - Same `chat_stream()` / `chat()` interface as the Ollama client for interchangeability
128
 
129
+ **`chatbot/tts.py` β€” Text-to-Speech**
130
+ - Uses [Kokoro ONNX](https://github.com/thewh1teagle/kokoro-onnx) (82M parameter model, ~300MB)
131
+ - Auto-downloads model files from HuggingFace Hub on first use
132
+ - Generates WAV audio from LLM response text, played in-browser via `st.audio`
133
+ - Toggleable via sidebar switch β€” disabled by default to save resources
134
+
135
  **`scraper/extract_pdf_text.py` β€” PDF Extraction**
136
  - Paragraph-level extraction using `pdfplumber` (not line-by-line)
137
  - Robust header/footer/sidebar removal with regex patterns
 
241
  | LLM (local dev) | Phi-3.5 Mini via Ollama |
242
  | Web Framework | Streamlit |
243
  | Hosting | HuggingFace Spaces (free tier) |
244
+ | Text-to-Speech | [Kokoro ONNX](https://github.com/thewh1teagle/kokoro-onnx) (82M) |
245
  | PDF Extraction | pdfplumber + BeautifulSoup |