HealthSLM (15M)

Model Description

HealthSLM is a lightweight, 15 million parameter decoder-only transformer model trained entirely from scratch on biomedical text. The model uses a modern architecture highly similar to LLaMA and Mistral, heavily optimised for efficient training and inference.

Despite its small parameter footprint, it implements state-of-the-art transformer components including Grouped-Query Attention (GQA), Rotary Positional Embeddings (RoPE), SwiGLU feed-forward networks, and RMSNorm. Additionally, the model utilises a custom Byte-Pair Encoding (BPE) tokenizer trained specifically on PubMed abstracts to ensure highly efficient subword encoding of complex biomedical terminology.

  • Developer: Rohan Singh
  • GitHub Repository: rohansingh-dev/health_slm
  • Model Type: Decoder-only core language model
  • Language: English
  • License: MIT
  • Architecture: LLaMA-style autoregressive transformer

Model Architecture Details

  • Total Parameters: ~15.3M
  • Layers (Blocks): 12
  • Hidden Dimension: 256
  • Attention Heads: 4 (Query), 2 (Key/Value via GQA)
  • Head Dimension: 64
  • Feed-Forward Dimension: 512 (SwiGLU)
  • Context Length: 1024 tokens
  • Vocabulary Size: 32,000

Intended Uses & Limitations

Intended Uses

  • Educational / Research: HealthSLM serves as a lightweight, fully understandable baseline for training biomedical language models from scratch.
  • Biomedical NLP: Extracting or generating simple biomedical concepts and text.
  • Parameter-Efficient Fine-Tuning (PEFT): The model natively supports LoRA fine-tuning for downstream medical QA, summarisation, or instruction-following.

Limitations & Biases

  • Medical Disclaimer: This model is NOT intended to provide clinical diagnoses, medical advice, or treatment recommendations. Its outputs should be strictly considered experimental and must be verified by a licensed human medical professional.
  • Scale Limitations: At just 15 million parameters, the model is significantly smaller than foundational models (e.g., Llama-3 8B) and has limited reasoning abilities. It may hallucinate or generate nonsensical text, particularly on long-context tasks.

Training Details

Pre-training Data

The model was pre-trained on a corpus of biomedical abstracts fetched directly from the NCBI / PubMed FTP servers.

Training Setup

  • Hardware: NVIDIA RTX 4060 Mobile (6GB VRAM)
  • Optimizer: AdamW
  • Precision: bf16
  • Learning Rate Schedule: Cosine decay with warmup (Peak LR: 2e-4)
  • Weight Decay: 0.1
  • Gradient Accumulation Strategies employed to achieve an effective batch size of 64 tokens.

Getting Started

Because HealthSLM uses a custom PyTorch architecture, you will need the model definition (model.py) to run it locally, or you can leverage it directly using the HealthSLM pipeline inference code.

Fine-Tuning and Quantization

The model natively supports loading in 4-bit precision (nf4) via bitsandbytes to drastically reduce its memory footprint down to just a few megabytes. It also actively supports LoRA (Rank=16) for downstream parameter-efficient finetuning routines.

Downloads last month
1,147
Safetensors
Model size
23.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train rohansingh2612/healthslm