mlabonne's picture
Upload folder using huggingface_hub
b75f25d verified
|
raw
history blame
6.32 kB
metadata
library_name: transformers
license: other
license_name: lfm1.0
license_link: LICENSE
language:
  - en
  - ar
  - zh
  - fr
  - de
  - ja
  - ko
  - es
pipeline_tag: text-generation
tags:
  - liquid
  - lfm2.5
  - edge
  - gguf

LFM2.5-1.2B-Instruct (W.I.P.)

LFM2.5 is a new family of hybrid models designed for on-device deployment. It builds on the LFM2 architecture with extended pre-training and reinforcement learning.

Highlights

  • Best performance among sub-2B models, particularly in instruction following.
  • 2x faster inference on CPU compared to Qwen3, with optimized prefill and decode speeds.
  • Hybrid architecture with a combination of convolution and attention blocks.

LFM2.5 Family

In the LFM2.5 family, we release:

Model Description
LFM2.5-1.2B-Base Pre-trained base model
LFM2.5-1.2B-Instruct General-purpose chat model
LFM2.5-1.2B-JP Japanese-optimized chat model
LFM2.5-VL-1.6B Vision-language model
LFM2.5-1.5B-Audio Audio-language model

Model Details

Property Value
Parameters 1.17B
Context length 32,768 tokens
Architecture 16 layers (10 conv + 6 attn)
Vocabulary 65,536
Precision bfloat16
Training budget 10T tokens
License LFM Open License v1.0

Supported languages: English, Arabic, Chinese, French, German, Japanese, Korean, Spanish

Recommended use cases: Agentic tasks, data extraction, RAG, creative writing, multi-turn conversations

Not recommended for: Knowledge-intensive tasks, programming

Quick Start

pip install -U transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "LiquidAI/LFM2.5-1.2B-Instruct"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype="bfloat16")
tokenizer = AutoTokenizer.from_pretrained(model_id)

messages = [{"role": "user", "content": "What is C. elegans?"}]
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)

output = model.generate(
    input_ids,
    do_sample=True,
    temperature=0.3,
    min_p=0.15,
    repetition_penalty=1.05,
    max_new_tokens=512,
)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Recommended generation parameters: temperature=0.3, min_p=0.15, repetition_penalty=1.05

Chat Template

LFM2.5 uses a ChatML-like format. See the Chat Template documentation for details.

<|startoftext|><|im_start|>system
You are a helpful assistant trained by Liquid AI.<|im_end|>
<|im_start|>user
What is C. elegans?<|im_end|>
<|im_start|>assistant

Use tokenizer.apply_chat_template() to automatically format your messages.

Tool Use

LFM2.5 supports function calling. See the Tool Use documentation for the full guide.

tools = [{
    "name": "get_weather",
    "description": "Get current weather for a location",
    "parameters": {
        "type": "object",
        "properties": {"location": {"type": "string"}},
        "required": ["location"]
    }
}]

messages = [{"role": "user", "content": "What's the weather in Paris?"}]
input_ids = tokenizer.apply_chat_template(messages, tools=tools, add_generation_prompt=True, return_tensors="pt")

Inference

Fine-tuning

We recommend fine-tuning on your specific use case for best results.

Method Link
SFT with Unsloth Colab Notebook
SFT with TRL Colab Notebook
DPO with TRL Colab Notebook

See the Fine-tuning documentation for more details.

Benchmarks

Model GPQA MMLU-Pro IFEval IFBench Multi-IF AIME25 BFCLv3
LFM2.5-1.2B-Instruct (BF16) 35.81 44.76 85.76 47.33 61.22 12.33 49.04
Qwen3-1.7B 34.85 42.91 73.68 21.33 56.48 9.33 46.30
Granite 4.0-1B 24.24 33.53 79.61 21.00 43.65 3.33 52.43
Llama 3.2 1B Instruct 16.57 20.80 52.37 15.93 30.16 0.33 21.44
Gemma 3 1B IT 24.24 14.04 63.25 20.47 44.31 1.00 16.64

Resources

Contact

For enterprise solutions and edge deployment, contact sales@liquid.ai.

Citation

@article{liquidai2025lfm2,
  title={LFM2 Technical Report},
  author={Liquid AI},
  journal={arXiv preprint arXiv:2511.23404},
  year={2025}
}