Earnings Intelligence Copilot β Fine-tuned Phi-3.5-mini (Merged)
A QLoRA fine-tuned and merged version of Phi-3.5-mini-instruct for structured KPI extraction from SEC filings. Part of the Earnings Intelligence Copilot β a multi-agent system that ingests SEC filings, extracts KPIs, and generates citation-grounded investment memos.
What it does
- Extracts financial KPIs (Revenue, Gross Margin, Operating Income, EPS, Free Cash Flow) from SEC filing chunks as structured JSON
- Returns
{"confidence": "UNVERIFIABLE"}instead of hallucinating when data is not present - Always includes a
source_quotefield grounding every answer in the original filing text
Model Details
| Property | Value |
|---|---|
| Base model | microsoft/Phi-3.5-mini-instruct |
| Fine-tuning method | QLoRA (4-bit NF4 quantization) |
| LoRA rank | r=16, alpha=32 |
| Target modules | q_proj, v_proj, k_proj, o_proj |
| Training examples | 619 balanced examples (50% HIGH / 50% UNVERIFIABLE) |
| Data source | 10-K and 10-Q filings, 20 S&P 500 companies (2020-2024) |
| Training hardware | Kaggle T4 GPU (~75 minutes) |
| Model type | Full merged model (adapter + base combined) |
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained(
"ratnasekhar/earnings-copilot-phi3-merged",
trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
"ratnasekhar/earnings-copilot-phi3-merged",
dtype=torch.float16,
device_map="auto",
trust_remote_code=True,
attn_implementation="eager"
)
model.eval()
chunk = "Net sales for Q1 FY2024 were $119.6 billion, an increase of 2% compared to Q1 FY2023."
prompt = (
"<|user|>\n"
"You are a financial KPI extraction model. Extract metrics as JSON. "
"Output {\"confidence\": \"UNVERIFIABLE\"} if not found. Never invent numbers.\n\n"
"Filing chunk:\n" + chunk + "\n\n"
"Extract: What was total revenue and its YoY change?<|end|>\n"
"<|assistant|>\n"
)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=150,
do_sample=False,
pad_token_id=tokenizer.eos_token_id,
use_cache=False
)
print(tokenizer.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Example Outputs
When data IS present:
{
"metric": "Total Revenue",
"value": "$119.6 billion",
"unit": "Billion",
"period": "Q1 FY2024",
"yoy_change": "+2%",
"source_quote": "Net sales for Q1 FY2024 were $119.6 billion",
"confidence": "HIGH"
}
When data is NOT present:
{
"confidence": "UNVERIFIABLE",
"reason": "The chunk does not contain any specific revenue figures."
}
Key Design Decision β Class Balance
Raw LLM-generated training data had 93% UNVERIFIABLE examples. Training on this imbalanced data would cause the model to always refuse. We deliberately resampled to 50/50 HIGH/UNVERIFIABLE to teach both behaviors equally β this is the core fine-tuning contribution of the project.
Related Models
| Model | Description |
|---|---|
| ratnasekhar/earnings-copilot-mistral-7b | Mistral-7B LoRA adapter (larger, higher quality) |
| ratnasekhar/earnings-copilot-phi3-merged | Phi-3.5-mini merged model (this model, deployable) |
System Architecture
SEC Filings (220 filings, 20 S&P 500 tickers)
β
Qdrant Cloud (2,464 financial table chunks)
β
Phi-3.5-mini β this model (KPI extraction)
β
Verification Agent (cross-checks every number)
β
Citation-grounded Investment Memo
- Downloads last month
- 92
Model tree for ratnasekhar/earnings-copilot-phi3-merged
Base model
microsoft/Phi-3.5-mini-instruct