HealthSLM (15M)
Model Description
HealthSLM is a lightweight, 15 million parameter decoder-only transformer model trained entirely from scratch on biomedical text. The model uses a modern architecture highly similar to LLaMA and Mistral, heavily optimised for efficient training and inference.
Despite its small parameter footprint, it implements state-of-the-art transformer components including Grouped-Query Attention (GQA), Rotary Positional Embeddings (RoPE), SwiGLU feed-forward networks, and RMSNorm. Additionally, the model utilises a custom Byte-Pair Encoding (BPE) tokenizer trained specifically on PubMed abstracts to ensure highly efficient subword encoding of complex biomedical terminology.
- Developer: Rohan Singh
- GitHub Repository: rohansingh-dev/health_slm
- Model Type: Decoder-only core language model
- Language: English
- License: MIT
- Architecture: LLaMA-style autoregressive transformer
Model Architecture Details
- Total Parameters: ~15.3M
- Layers (Blocks): 12
- Hidden Dimension: 256
- Attention Heads: 4 (Query), 2 (Key/Value via GQA)
- Head Dimension: 64
- Feed-Forward Dimension: 512 (SwiGLU)
- Context Length: 1024 tokens
- Vocabulary Size: 32,000
Intended Uses & Limitations
Intended Uses
- Educational / Research: HealthSLM serves as a lightweight, fully understandable baseline for training biomedical language models from scratch.
- Biomedical NLP: Extracting or generating simple biomedical concepts and text.
- Parameter-Efficient Fine-Tuning (PEFT): The model natively supports LoRA fine-tuning for downstream medical QA, summarisation, or instruction-following.
Limitations & Biases
- Medical Disclaimer: This model is NOT intended to provide clinical diagnoses, medical advice, or treatment recommendations. Its outputs should be strictly considered experimental and must be verified by a licensed human medical professional.
- Scale Limitations: At just 15 million parameters, the model is significantly smaller than foundational models (e.g., Llama-3 8B) and has limited reasoning abilities. It may hallucinate or generate nonsensical text, particularly on long-context tasks.
Training Details
Pre-training Data
The model was pre-trained on a corpus of biomedical abstracts fetched directly from the NCBI / PubMed FTP servers.
Training Setup
- Hardware: NVIDIA RTX 4060 Mobile (6GB VRAM)
- Optimizer: AdamW
- Precision:
bf16 - Learning Rate Schedule: Cosine decay with warmup (Peak LR: 2e-4)
- Weight Decay: 0.1
- Gradient Accumulation Strategies employed to achieve an effective batch size of 64 tokens.
Getting Started
Because HealthSLM uses a custom PyTorch architecture, you will need the model definition (model.py) to run it locally, or you can leverage it directly using the HealthSLM pipeline inference code.
Fine-Tuning and Quantization
The model natively supports loading in 4-bit precision (nf4) via bitsandbytes to drastically reduce its memory footprint down to just a few megabytes.
It also actively supports LoRA (Rank=16) for downstream parameter-efficient finetuning routines.
- Downloads last month
- 1,147