HealthSLM (15M)

Model Description

HealthSLM is a lightweight, 15 million parameter decoder-only transformer model trained entirely from scratch on biomedical text. The model uses a modern architecture highly similar to LLaMA and Mistral, heavily optimised for efficient training and inference.

Despite its small parameter footprint, it implements state-of-the-art transformer components including Grouped-Query Attention (GQA), Rotary Positional Embeddings (RoPE), SwiGLU feed-forward networks, and RMSNorm. Additionally, the model utilises a custom Byte-Pair Encoding (BPE) tokenizer trained specifically on PubMed abstracts to ensure highly efficient subword encoding of complex biomedical terminology.

Developer: Rohan Singh
GitHub Repository: rohansingh-dev/health_slm
Model Type: Decoder-only core language model
Language: English
License: MIT
Architecture: LLaMA-style autoregressive transformer

Model Architecture Details

Total Parameters: ~15.3M
Layers (Blocks): 12
Hidden Dimension: 256
Attention Heads: 4 (Query), 2 (Key/Value via GQA)
Head Dimension: 64
Feed-Forward Dimension: 512 (SwiGLU)
Context Length: 1024 tokens
Vocabulary Size: 32,000

Intended Uses & Limitations

Intended Uses

Educational / Research: HealthSLM serves as a lightweight, fully understandable baseline for training biomedical language models from scratch.
Biomedical NLP: Extracting or generating simple biomedical concepts and text.
Parameter-Efficient Fine-Tuning (PEFT): The model natively supports LoRA fine-tuning for downstream medical QA, summarisation, or instruction-following.

Limitations & Biases

Medical Disclaimer: This model is NOT intended to provide clinical diagnoses, medical advice, or treatment recommendations. Its outputs should be strictly considered experimental and must be verified by a licensed human medical professional.
Scale Limitations: At just 15 million parameters, the model is significantly smaller than foundational models (e.g., Llama-3 8B) and has limited reasoning abilities. It may hallucinate or generate nonsensical text, particularly on long-context tasks.

Training Details

Pre-training Data

The model was pre-trained on a corpus of biomedical abstracts fetched directly from the NCBI / PubMed FTP servers.

Training Setup

Hardware: NVIDIA RTX 4060 Mobile (6GB VRAM)
Optimizer: AdamW
Precision: bf16
Learning Rate Schedule: Cosine decay with warmup (Peak LR: 2e-4)
Weight Decay: 0.1
Gradient Accumulation Strategies employed to achieve an effective batch size of 64 tokens.

Getting Started

Because HealthSLM uses a custom PyTorch architecture, you will need the model definition (model.py) to run it locally, or you can leverage it directly using the HealthSLM pipeline inference code.

Fine-Tuning and Quantization

The model natively supports loading in 4-bit precision (nf4) via bitsandbytes to drastically reduce its memory footprint down to just a few megabytes. It also actively supports LoRA (Rank=16) for downstream parameter-efficient finetuning routines.

Downloads last month: 1,147

Safetensors

Model size

23.5M params

Tensor type

F32

rohansingh2612
/

healthslm