Instructions to use Mungert/LFM2.5-Audio-1.5B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Mungert/LFM2.5-Audio-1.5B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Mungert/LFM2.5-Audio-1.5B-GGUF", filename="LFM2.5-Audio-1.5B-bf16.gguf", )
llm.create_chat_completion( messages = "\"sample1.flac\"" )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Mungert/LFM2.5-Audio-1.5B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Mungert/LFM2.5-Audio-1.5B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Mungert/LFM2.5-Audio-1.5B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Mungert/LFM2.5-Audio-1.5B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Mungert/LFM2.5-Audio-1.5B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Mungert/LFM2.5-Audio-1.5B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Mungert/LFM2.5-Audio-1.5B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Mungert/LFM2.5-Audio-1.5B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Mungert/LFM2.5-Audio-1.5B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/Mungert/LFM2.5-Audio-1.5B-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use Mungert/LFM2.5-Audio-1.5B-GGUF with Ollama:
ollama run hf.co/Mungert/LFM2.5-Audio-1.5B-GGUF:Q4_K_M
- Unsloth Studio
How to use Mungert/LFM2.5-Audio-1.5B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Mungert/LFM2.5-Audio-1.5B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Mungert/LFM2.5-Audio-1.5B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Mungert/LFM2.5-Audio-1.5B-GGUF to start chatting
- Pi
How to use Mungert/LFM2.5-Audio-1.5B-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Mungert/LFM2.5-Audio-1.5B-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Mungert/LFM2.5-Audio-1.5B-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Mungert/LFM2.5-Audio-1.5B-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Mungert/LFM2.5-Audio-1.5B-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Mungert/LFM2.5-Audio-1.5B-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use Mungert/LFM2.5-Audio-1.5B-GGUF with Docker Model Runner:
docker model run hf.co/Mungert/LFM2.5-Audio-1.5B-GGUF:Q4_K_M
- Lemonade
How to use Mungert/LFM2.5-Audio-1.5B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Mungert/LFM2.5-Audio-1.5B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.LFM2.5-Audio-1.5B-GGUF-Q4_K_M
List all available models
lemonade list
LFM2.5-Audio-1.5B GGUF Models
Model Generation Details
This model was generated using llama.cpp at commit 05fa625ea.
Click here to get info on choosing the right GGUF model format
LFM2.5‑Audio-1.5B
LFM2.5-Audio-1.5B is Liquid AI's updated end-to-end audio foundation model.
Key improvements include a custom, LFM based audio detokenizer, llama.cpp compatible GGUFs for CPU inference, and better ASR and TTS performance.
LFM2.5-Audio is an end-to-end multimodal speech and text language model, and as such does not require separate ASR and TTS components. Designed with low latency and real time conversation in mind, at only 1.5 billion parameters LFM2.5-Audio enables seamless conversational interaction, achieving capabilities on par with much larger models. Our model consists of a pretrained LFM2.5 model as its multimodal backbone, along with a FastConformer based audio encoder to handle continuous audio inputs, and an RQ-transformer generating discrete tokens coupled with a lightweight audio detokenizer for audio output.
LFM2.5-Audio supports two distinct generation routines, each suitable for a set of tasks. Interleaved generation enables real-time speech-to-speech conversational chatbot capabilities, where audio generation latency is key. Sequential generation is suited for non-conversational tasks such as ASR or TTS, and allows the model to switch generated modality on the fly.
📄 Model details
| Property | |
|---|---|
| Parameters (LM only) | 1.2B |
| Audio encoder | FastConformer (115M, canary-180m-flash) |
| Backbone layers | hybrid conv+attention |
| Audio detokenizer | Mimi-compatible, using 8 codebooks |
| Context | 32,768 tokens |
| Vocab size | 65,536 (text) / 2049*8 (audio) |
| Precision | bfloat16 |
| License | LFM Open License v1.0 |
Supported languages: English
🏃 How to run LFM2.5-Audio
Install the liquid-audio package via pip
pip install liquid-audio
pip install "liquid-audio [demo]" # optional, to install demo dependencies
pip install flash-attn --no-build-isolation # optional, to use flash attention 2. Will fallback to torch SDPA if not installed
Gradio demo
The simplest way to get started is by running the Gradio demo interface. After installation, run the command
liquid-audio-demo
This will start a webserver on port 7860. The interface can then be accessed via the URL http://localhost:7860/.
Multi-turn, multi-modal chat
The liquid-audio library provides a lower lever interface to the model and generation routines, ideal for custom usecases.
We demonstrate this with a simple multi-turn chat, where the first turn is given as audio, and the second turn is given as text.
For multi-turn chat with text and audio output, we use interleaved generation. The system prompt should be set to Respond with interleaved text and audio.. Here we use audio as the first user turn, and text as the second one.
import torch
import torchaudio
from liquid_audio import LFM2AudioModel, LFM2AudioProcessor, ChatState, LFMModality
# Load models
HF_REPO = "LiquidAI/LFM2.5-Audio-1.5B"
processor = LFM2AudioProcessor.from_pretrained(HF_REPO).eval()
model = LFM2AudioModel.from_pretrained(HF_REPO).eval()
# Set up inputs for the model
chat = ChatState(processor)
chat.new_turn("system")
chat.add_text("Respond with interleaved text and audio.")
chat.end_turn()
chat.new_turn("user")
wav, sampling_rate = torchaudio.load("assets/question.wav")
chat.add_audio(wav, sampling_rate)
chat.end_turn()
chat.new_turn("assistant")
# Generate text and audio tokens.
text_out: list[torch.Tensor] = []
audio_out: list[torch.Tensor] = []
modality_out: list[LFMModality] = []
for t in model.generate_interleaved(**chat, max_new_tokens=512, audio_temperature=1.0, audio_top_k=4):
if t.numel() == 1:
print(processor.text.decode(t), end="", flush=True)
text_out.append(t)
modality_out.append(LFMModality.TEXT)
else:
audio_out.append(t)
modality_out.append(LFMModality.AUDIO_OUT)
# output: Sure! How about "Handcrafted Woodworking, Precision Made for You"? Another option could be "Quality Woodworking, Quality Results." If you want something more personal, you might try "Your Woodworking Needs, Our Expertise."
# Detokenize audio, removing the last "end-of-audio" codes
# Mimi returns audio at 24kHz
audio_codes = torch.stack(audio_out[:-1], 1).unsqueeze(0)
waveform = processor.decode(audio_codes)
torchaudio.save("answer1.wav", waveform.cpu(), 24_000)
# Append newly generated tokens to chat history
chat.append(
text = torch.stack(text_out, 1),
audio_out = torch.stack(audio_out, 1),
modality_flag = torch.tensor(modality_out),
)
chat.end_turn()
# Start new turn
chat.new_turn("user")
chat.add_text("My business specialized in chairs, can you give me something related to that?")
chat.end_turn()
chat.new_turn("assistant")
# Generate second turn text and audio tokens.
audio_out: list[torch.Tensor] = []
for t in model.generate_interleaved(**chat, max_new_tokens=512, audio_temperature=1.0, audio_top_k=4):
if t.numel() == 1:
print(processor.text.decode(t), end="", flush=True)
else:
audio_out.append(t)
# output: Sure thing! How about “Comfortable Chairs, Crafted with Care” or “Elegant Seats, Handcrafted for You”? Let me know if you’d like a few more options.
# Detokenize second turn audio, removing the last "end-of-audio" codes
audio_codes = torch.stack(audio_out[:-1], 1).unsqueeze(0)
waveform = processor.decode(audio_codes)
torchaudio.save("answer2.wav", waveform.cpu(), 24_000)
ASR, TTS, additional information
Please visit the liquid-audio package repository for additional examples and sample audio snippets.
📈 Performance
VoiceBench (audio input)
Higher is better. AlpacaEval, CommonEval and WildVoice are scored out of 5.
| Model | Components & Size | AlpacaEval | CommonEval | WildVoice | SD-QA | MMSU | OBQA | BBH | IFEval | ADVBench | Overall |
|---|---|---|---|---|---|---|---|---|---|---|---|
| LFM2.5-Audio-1.5B | 1.5B parameters | 3.76 | 3.53 | 3.15 | 30.92 | 33.04 | 44.40 | 49.54 | 28.48 | 99.04 | 54.92 |
| LFM2-Audio-1.5B | 1.5B parameters | 3.71 | 3.49 | 3.17 | 30.56 | 31.95 | 44.40 | 30.54 | 31.23 | 97.31 | 52.77 |
| Moshi | 7B parameters | 2.01 | 1.60 | 1.30 | 15.64 | 24.04 | 25.93 | 47.40 | 10.12 | 44.23 | 29.51 |
| Qwen2.5-Omni-3B | 5B parameters | 3.72 | 3.51 | 3.42 | 44.94 | 55.29 | 76.26 | 61.30 | 32.90 | 88.46 | 63.57 |
| Mini-Omni2 | 0.6B parameters | 2.32 | 2.18 | 1.79 | 9.31 | 24.27 | 26.59 | 46.40 | 11.56 | 57.50 | 33.49 |
ASR
Word Error Rate (WER), lower is better.
| Model | Components & Size | Audio output | Open | WER average | AMI | Earnings22 | GigaSpeech | LibriSpeech-clean | LibriSpeech-other | SPGISpeech | TED-LIUM | VoxPopuli |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LFM2.5-Audio-1.5B | 1.5B parameters | Yes | Yes | 7.53 | 15.63 | 14.56 | 10.47 | 1.95 | 4.30 | 2.76 | 3.47 | 7.13 |
| LFM2-Audio-1.5B | 1.5B parameters | Yes | Yes | 9.38 | 15.36 | 19.75 | 10.63 | 2.03 | 4.39 | 4.17 | 3.56 | 9.93 |
| Qwen2.5-Omni-3B | 5B parameters | Yes | Yes | 7.90 | 15.05 | 14.81 | 11.76 | 2.14 | 4.52 | 3.24 | 5.08 | 5.60 |
| Whisper-large-V3 | 1.5B parameters | No — ASR only | Yes | 7.44 | 15.95 | 11.29 | 10.02 | 2.01 | 3.91 | 2.94 | 3.86 | 9.54 |
📬 Contact
If you are interested in custom solutions with edge deployment, please contact our sales team.
License
The code in this the package repository and associated weights are licensed under the LFM Open License v1.0.
The code for the audio encoder is based on Nvidia NeMo, licensed under Apache 2.0, and the canary-180m-flash checkpoint, licensed under CC-BY 4.0. To simplify dependency resolution, we also ship the Python code of Kyutai Mimi, licensed under the MIT License. We also redistribute weights for Kyutai Mimi, licensed under CC-BY-4.0.
Citation
@article{liquidai2025lfm2,
title={LFM2 Technical Report},
author={Liquid AI},
journal={arXiv preprint arXiv:2511.23404},
year={2025}
}
🚀 If you find these models useful
Help me test my AI-Powered Quantum Network Monitor Assistant with quantum-ready security checks:
The full Open Source Code for the Quantum Network Monitor Service available at my github repos ( repos with NetworkMonitor in the name) : Source Code Quantum Network Monitor. You will also find the code I use to quantize the models if you want to do it yourself GGUFModelBuilder
💬 How to test:
Choose an AI assistant type:
TurboLLM(GPT-4.1-mini)HugLLM(Hugginface Open-source models)TestLLM(Experimental CPU-only)
What I’m Testing
I’m pushing the limits of small open-source models for AI network monitoring, specifically:
- Function calling against live network services
- How small can a model go while still handling:
- Automated Nmap security scans
- Quantum-readiness checks
- Network Monitoring tasks
🟡 TestLLM – Current experimental model (llama.cpp on 2 CPU threads on huggingface docker space):
- ✅ Zero-configuration setup
- ⏳ 30s load time (slow inference but no API costs) . No token limited as the cost is low.
- 🔧 Help wanted! If you’re into edge-device AI, let’s collaborate!
Other Assistants
🟢 TurboLLM – Uses gpt-4.1-mini :
- **It performs very well but unfortunatly OpenAI charges per token. For this reason tokens usage is limited.
- Create custom cmd processors to run .net code on Quantum Network Monitor Agents
- Real-time network diagnostics and monitoring
- Security Audits
- Penetration testing (Nmap/Metasploit)
🔵 HugLLM – Latest Open-source models:
- 🌐 Runs on Hugging Face Inference API. Performs pretty well using the lastest models hosted on Novita.
💡 Example commands you could test:
"Give me info on my websites SSL certificate""Check if my server is using quantum safe encyption for communication""Run a comprehensive security audit on my server"- '"Create a cmd processor to .. (what ever you want)" Note you need to install a Quantum Network Monitor Agent to run the .net code on. This is a very flexible and powerful feature. Use with caution!
Final Word
I fund the servers used to create these model files, run the Quantum Network Monitor service, and pay for inference from Novita and OpenAI—all out of my own pocket. All the code behind the model creation and the Quantum Network Monitor project is open source. Feel free to use whatever you find helpful.
If you appreciate the work, please consider buying me a coffee ☕. Your support helps cover service costs and allows me to raise token limits for everyone.
I'm also open to job opportunities or sponsorship.
Thank you! 😊
- Downloads last month
- 761
Model tree for Mungert/LFM2.5-Audio-1.5B-GGUF
Base model
LiquidAI/LFM2-1.2B