[Bug] FP8 model fails to load in vLLM due to missing vision modules & broken architecture parsing
Hi, thanks for your great work!
I ran into a crash when loading the Qwopus3.5-27B-v3-FP8 version in vLLM. After debugging, I found that all vision modules (model.visual.*) were entirely excluded during the FP8 export.
Using it as a text-only model is totally fine for VRAM optimization, but the current configuration is broken:
model_type defaults to qwen3_5_text, which isn't registered in vLLM/Transformers natively.
The safetensors preserve the legacy model.language_model. prefixes instead of being flattened to model., resulting in weight loading failures.
The tokenizer_config.json incorrectly relies on TokenizersBackend.
Our Workaround: We had to aggressively patch vLLMβs local registry and inject a dynamic WeightsMapper in Python to strip the language_model prefix from the tensors to make it run.
Question: Are there any plans to release a fully-functioning multimodal FP8 version with the vision tower included? Or alternatively, a properly normalized text-only export fix so we don't have to hack the vLLM source code? Thanks!
Thanks a lot for your detailed feedback β I really appreciate you taking the time to debug this so thoroughly.
Iβve noticed the issues with the quantized (FP8) version as well. Please give us a little time β my friend Kyle is currently uploading a fixed version of the model.
Thanks again for pointing this out!
Awesome, thanks for the super quick response and fix! Looking forward to testing Kyle's updated version. Really appreciate all the hard work you guys are putting into this model!
I hope to restore the model's multimodal capabilities.
File unsloth Jackrong Need to restore
chat_template β β Simplified β
processor_config.json β β -
preprocessor_config.json β β β
video_preprocessor_config.json β β β
I think you can directly use mconcat/Qwopus3.5-27B-v3-FP8-Dynamic on Hugging Face.