GGUF missing nextn tensors: embed_tokens and shared_head_head (llama.cpp fork fails to load)
Issue
The GGUF files (e.g., EXAONE-4.5-33B-Q4_K_M.gguf) cannot be loaded by the official forked llama.cpp (nuxlear/llama.cpp@add-exaone4_5) due to missing nextn tensors.
Error
model has unused tensor blk.64.ffn_gate.weight -- ignoring
model has unused tensor blk.64.ffn_down.weight -- ignoring
model has unused tensor blk.64.ffn_up.weight -- ignoring
model has unused tensor blk.64.post_ffw_norm.weight -- ignoring
llama_model_load: error loading model: missing tensor 'blk.%d.nextn.eh_proj'
Root Cause
The GGUF contains 4 nextn tensors in blk.64:
blk.64.nextn.eh_proj.weightβblk.64.nextn.enorm.weightβblk.64.nextn.hnorm.weightβblk.64.nextn.shared_head_norm.weightβ
But the forked llama.cpp code (src/llama-arch.cpp line 437-442) expects 6 nextn tensors:
blk.%d.nextn.eh_projβblk.%d.nextn.embed_tokensβ MISSINGblk.%d.nextn.enormβblk.%d.nextn.hnormβblk.%d.nextn.shared_head_headβ MISSINGblk.%d.nextn.shared_head_normβ
This causes a tensor count mismatch: expected 723, got 719 (4 missing = 2 tensors Γ 2 nextn layers? or 4 tensor entries).
Environment
- Hardware: NVIDIA DGX Spark GB10 (Blackwell SM121, 128GB)
- llama.cpp fork:
nuxlear/llama.cpp@add-exaone4_5(commit 3b12fcd1) - GGUF:
EXAONE-4.5-33B-Q4_K_M.gguffrom this repo - mmproj:
mmproj-EXAONE-4.5-33B-BF16.ggufincluded - Also tested with upstream llama.cpp (latest) β same error
Steps to Reproduce
git clone -b add-exaone4_5 https://github.com/nuxlear/llama.cpp
cd llama.cpp && cmake -B build -DGGML_CUDA=ON && cmake --build build -j$(nproc) --target llama-server
./build/bin/llama-server \
-m EXAONE-4.5-33B-Q4_K_M.gguf \
-mm mmproj-EXAONE-4.5-33B-BF16.gguf \
-ngl 999 -c 8192 --port 8000 -a EXAONE-4.5-33B --jinja
Expected
Model loads successfully and serves via OpenAI-compatible API.
Actual
Model fails to load with missing tensor error.
Suggestion
Either:
- Include the missing
embed_tokensandshared_head_headtensors in the GGUF conversion - Or update the forked llama.cpp to make these tensors optional
Hello, @miniwithmama . Thank you for your contribution!
We found that some nextn tensors were missing in the architecture definition.
We added a hotfix commit to our fork and confirmed that the example code works properly.
Could you please try again after updating the repository?