Jackrong/Qwopus3.5-27B-v3-FP8 · [Bug] FP8 model fails to load in vLLM due to missing vision modules & broken architecture parsing

[Bug] FP8 model fails to load in vLLM due to missing vision modules & broken architecture parsing

by mrwd2005 - opened 7 days ago

mrwd2005

Hi, thanks for your great work!

I ran into a crash when loading the Qwopus3.5-27B-v3-FP8 version in vLLM. After debugging, I found that all vision modules (model.visual.*) were entirely excluded during the FP8 export.

Using it as a text-only model is totally fine for VRAM optimization, but the current configuration is broken:

model_type defaults to qwen3_5_text, which isn't registered in vLLM/Transformers natively.
The safetensors preserve the legacy model.language_model. prefixes instead of being flattened to model., resulting in weight loading failures.
The tokenizer_config.json incorrectly relies on TokenizersBackend.
Our Workaround: We had to aggressively patch vLLM’s local registry and inject a dynamic WeightsMapper in Python to strip the language_model prefix from the tensors to make it run.

Question: Are there any plans to release a fully-functioning multimodal FP8 version with the vision tower included? Or alternatively, a properly normalized text-only export fix so we don't have to hack the vLLM source code? Thanks!

Jackrong

Owner 7 days ago

Thanks a lot for your detailed feedback — I really appreciate you taking the time to debug this so thoroughly.

I’ve noticed the issues with the quantized (FP8) version as well. Please give us a little time — my friend Kyle is currently uploading a fixed version of the model.

Thanks again for pointing this out!

mrwd2005

7 days ago

Awesome, thanks for the super quick response and fix! Looking forward to testing Kyle's updated version. Really appreciate all the hard work you guys are putting into this model!

nwzjk

4 days ago

@Jackrong

I hope to restore the model's multimodal capabilities.

File unsloth Jackrong Need to restore
chat_template ✅ ❌ Simplified ✅
processor_config.json ✅ ✅ -
preprocessor_config.json ✅ ❌ ✅
video_preprocessor_config.json ✅ ❌ ✅

mrwd2005

3 days ago

@Jackrong

I hope to restore the model's multimodal capabilities.

File unsloth Jackrong Need to restore
chat_template ✅ ❌ Simplified ✅
processor_config.json ✅ ✅ -
preprocessor_config.json ✅ ❌ ✅
video_preprocessor_config.json ✅ ❌ ✅

I think you can directly use mconcat/Qwopus3.5-27B-v3-FP8-Dynamic on Hugging Face.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment