crashed

#2
by androidli - opened

Total : 20029.33, 2406.81, 22436.14 MiB
Memory required for model tensors + cache: 22834 MiB
Memory available on all devices - compute: 22799 MiB
llm_load_tensors: ggml ctx size = 0.61 MiB
llama_model_load: error loading model: check_tensor_dims: tensor 'blk.11.attn_q.weight' not found
llama_model_load_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'models/DJLougen/Ornstein3.6-35B-A3B-RYS-SABER-GGUF/Ornstein3.6-35B-A3B-RYS-SABER-Q4_K_M.gguf'
ERR [ load_model] unable to load model | tid="127779220193280" timestamp=1776405417 model="models/DJLougen/Ornstein3.6-35B-A3B-RYS-SABER-GGUF/Ornstein3.6-35B-A3B-RYS-SABER-Q4_K_M.gguf"
free(): invalid pointer
Aborted (core dumped)

Same thing here with the Q4_K_S version.

load_tensors: loading model tensors, this can take a while... (mmap = false, direct_io = false)
llama_model_load: error loading model: missing tensor 'blk.11.attn_q.weight'
llama_model_load_from_file_impl: failed to load model

I can confirm this on latest llama.cpp with Q4_K_M. Everything seems normal on load up to this line where it crashes:
'''
...
create_tensor: loading tensor blk.11.attn_qkv.weight
llama_model_load: error loading model: check_tensor_dims: tensor 'blk.11.attn_qkv.weight' has wrong shape; expected 2048, 9216, got 2048, 8192, 1, 1
llama_model_load_from_file_impl: failed to load model
...
'''

EDIT: Same happens with Q5_K_M. Shame, I really want to try this one out!
EDIT 2: I made my own quant through hf - Q5_K_M from the DJLougen/Ornstein3.6-35B-A3B-RYS-SABER, still not working - the problem is that model, or at the top @DJLougen .

I'm having the same problem as the others with ollama. Using Q8
source=server.go:1218 msg="llm load error: failed to initialize model: qwen3next: layer 12 missing attn_qkv/attn_gate projections"

Ornstein3.6-35B-A3B-RYS-GGUF also fails, however Ornstein3.6-35B-A3B-SABER-GGUF succeeds.

EDIT: the instructions say we need a special fork of llama.cpp

Sign up or log in to comment