GGUF missing nextn tensors: embed_tokens and shared_head_head (llama.cpp fork fails to load)

by miniwithmama - opened 9 days ago

Issue

The GGUF files (e.g., EXAONE-4.5-33B-Q4_K_M.gguf) cannot be loaded by the official forked llama.cpp (nuxlear/llama.cpp@add-exaone4_5) due to missing nextn tensors.

Error

model has unused tensor blk.64.ffn_gate.weight -- ignoring
model has unused tensor blk.64.ffn_down.weight -- ignoring
model has unused tensor blk.64.ffn_up.weight -- ignoring
model has unused tensor blk.64.post_ffw_norm.weight -- ignoring
llama_model_load: error loading model: missing tensor 'blk.%d.nextn.eh_proj'

Root Cause

The GGUF contains 4 nextn tensors in blk.64:

blk.64.nextn.eh_proj.weight ✅
blk.64.nextn.enorm.weight ✅
blk.64.nextn.hnorm.weight ✅
blk.64.nextn.shared_head_norm.weight ✅

But the forked llama.cpp code (src/llama-arch.cpp line 437-442) expects 6 nextn tensors:

blk.%d.nextn.eh_proj ✅
blk.%d.nextn.embed_tokens ❌ MISSING
blk.%d.nextn.enorm ✅
blk.%d.nextn.hnorm ✅
blk.%d.nextn.shared_head_head ❌ MISSING
blk.%d.nextn.shared_head_norm ✅

This causes a tensor count mismatch: expected 723, got 719 (4 missing = 2 tensors × 2 nextn layers? or 4 tensor entries).

Environment

Hardware: NVIDIA DGX Spark GB10 (Blackwell SM121, 128GB)
llama.cpp fork: nuxlear/llama.cpp@add-exaone4_5 (commit 3b12fcd1)
GGUF: EXAONE-4.5-33B-Q4_K_M.gguf from this repo
mmproj: mmproj-EXAONE-4.5-33B-BF16.gguf included
Also tested with upstream llama.cpp (latest) — same error

Steps to Reproduce

git clone -b add-exaone4_5 https://github.com/nuxlear/llama.cpp
cd llama.cpp && cmake -B build -DGGML_CUDA=ON && cmake --build build -j$(nproc) --target llama-server

./build/bin/llama-server \
  -m EXAONE-4.5-33B-Q4_K_M.gguf \
  -mm mmproj-EXAONE-4.5-33B-BF16.gguf \
  -ngl 999 -c 8192 --port 8000 -a EXAONE-4.5-33B --jinja

Expected

Model loads successfully and serves via OpenAI-compatible API.

Actual

Model fails to load with missing tensor error.

Suggestion

Either:

Include the missing embed_tokens and shared_head_head tensors in the GGUF conversion
Or update the forked llama.cpp to make these tensors optional

nuxlear

LG AI Research org 8 days ago

Hello, @miniwithmama . Thank you for your contribution!

We found that some nextn tensors were missing in the architecture definition.
We added a hotfix commit to our fork and confirmed that the example code works properly.

Could you please try again after updating the repository?

nuxlear

LG AI Research org 8 days ago

This comment has been hidden (marked as Off-Topic)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment