How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Alanine-nya/MOSS-Music-8B-Thinking-Q5_K_M-GGUF:Q5_K_M
# Run inference directly in the terminal:
llama-cli -hf Alanine-nya/MOSS-Music-8B-Thinking-Q5_K_M-GGUF:Q5_K_M
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Alanine-nya/MOSS-Music-8B-Thinking-Q5_K_M-GGUF:Q5_K_M
# Run inference directly in the terminal:
llama-cli -hf Alanine-nya/MOSS-Music-8B-Thinking-Q5_K_M-GGUF:Q5_K_M
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Alanine-nya/MOSS-Music-8B-Thinking-Q5_K_M-GGUF:Q5_K_M
# Run inference directly in the terminal:
./llama-cli -hf Alanine-nya/MOSS-Music-8B-Thinking-Q5_K_M-GGUF:Q5_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Alanine-nya/MOSS-Music-8B-Thinking-Q5_K_M-GGUF:Q5_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Alanine-nya/MOSS-Music-8B-Thinking-Q5_K_M-GGUF:Q5_K_M
Use Docker
docker model run hf.co/Alanine-nya/MOSS-Music-8B-Thinking-Q5_K_M-GGUF:Q5_K_M
Quick Links

⚠️ Important Note about Audio Understanding

This model is originally an audio-language model (MOSS-Music).
However, this GGUF repository only contains the LLM backbone, not the required multimodal projector (mmproj) file.

  • Text-only inference works perfectly (just like any other LLM).
  • Audio input is NOT supported with this GGUF file alone.
    To use audio understanding, you need a compatible mmproj file (which is not provided here and not yet available for MOSS-Music in llama.cpp).

If you attempt to feed audio using llama-mtmd-cli or similar, the model will not understand it.

We recommend:

  • Using the original Transformers version if you need audio capabilities.
  • Or waiting for a future release that includes the projector adapter.

Alanine-nya/MOSS-Music-8B-Thinking-Q5_K_M-GGUF

This model was converted to GGUF format from OpenMOSS-Team/MOSS-Music-8B-Thinking using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model.

Use with llama.cpp

Install llama.cpp through brew (works on Mac and Linux)

brew install llama.cpp

Invoke the llama.cpp server or the CLI.

CLI:

llama-cli --hf-repo Alanine-nya/MOSS-Music-8B-Thinking-Q5_K_M-GGUF --hf-file moss-music-8b-thinking-q5_k_m.gguf -p "The meaning to life and the universe is"

Server:

llama-server --hf-repo Alanine-nya/MOSS-Music-8B-Thinking-Q5_K_M-GGUF --hf-file moss-music-8b-thinking-q5_k_m.gguf -c 2048

Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.

git clone https://github.com/ggerganov/llama.cpp

Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1 flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).

cd llama.cpp && LLAMA_CURL=1 make

Step 3: Run inference through the main binary.

./llama-cli --hf-repo Alanine-nya/MOSS-Music-8B-Thinking-Q5_K_M-GGUF --hf-file moss-music-8b-thinking-q5_k_m.gguf -p "The meaning to life and the universe is"

or

./llama-server --hf-repo Alanine-nya/MOSS-Music-8B-Thinking-Q5_K_M-GGUF --hf-file moss-music-8b-thinking-q5_k_m.gguf -c 2048
Downloads last month
247
GGUF
Model size
8B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Alanine-nya/MOSS-Music-8B-Thinking-Q5_K_M-GGUF

Quantized
(1)
this model