CyberSpark2-7b-cabin-sft-v2-gptq-int8 Qwen2.5-VL-7B-GPTQ-Int8 (高精度量化版)

这是一个基于 Qwen2.5-VL-7B-Instruct (大参数版本) 的多模态座舱助手模型，经过 GPTQ INT8 量化（高质量校准变体）。它在大幅减少 7B 模型显存占用的同时，通过精细的校准数据集最大程度保留了视觉识别与指令推荐的精度，适合配备独立 GPU 的高算力域控平台。

特性

多模态：支持图像 + 文本输入，结合座舱场景提示词。
中文优化：针对中文座舱交互进行了有监督微调（SFT）。
动作标签：输出同时包含自然语言与结构化标签/操作串，便于集成到车机控制逻辑。

适用场景

座舱情绪识别与安抚，联动灯光、音乐等舒适性功能。
驾驶分心行为识别并提醒，如打电话、抽烟、打瞌睡等。
儿童安全带佩戴检测并提示。

输入输出格式（推荐）

采用 ChatML 风格消息结构，结合 <image> 标记与操作串：

输入：messages 列表 + images 列表（图像路径或 PIL.Image）。
输出：自然语言 + 结构化操作串，例如：<|SPEECH_END|> playOperation|PAUSE| ... 或 <|TAG|> phone_left_hand。

示例提示词

情绪安抚与功能推荐

{
  "messages": [
    {"role": "system", "content": "You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech."},
    {"role": "user", "content": "<image>你现在是一个优秀的汽车座舱智能语音助手，你可以通过输入图片信息做出对应的回答和功能推荐。如果识别到了用户有开心、悲伤、愤怒、平静等情绪，可以安抚并给出功能推荐。"}
  ],
  "images": ["/data/LLM-SFT/datasets/face_emotion_dataset_0429/images/3675.jpg"]
}

期望输出示例：

我帮您保持柔和的中性色灯光，播放了一首安静的背景乐 <|SPEECH_END|> enableAmbientLight|true| setAmbientLightMode|3| playOperation|PLAY|tag|neutral|

分心行为提醒

{
  "messages": [
    {"role": "system", "content": "You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech."},
    {"role": "user", "content": "<image>你现在是一个优秀的汽车座舱智能语音助手，你可以通过输入图片信息做出对应的回答和功能推荐。\n如果识别到了用户在吸烟、打瞌睡、接打电话等行为，可以做出提醒并给出功能推荐。"}
  ],
  "images": ["/data/DMS_CV/datasets/state-farm-distracted-driver-detection/imgs/val/c4/img_22127.jpg"]
}

期望输出示例：

您正在打电话，我帮您把车内音乐调低了，请专心驾驶 <|SPEECH_END|> playOperation|PAUSE| setMediaVolume|6| setMute|true| <|TAG|> phone_left_hand

儿童安全带佩戴检测

{
  "messages": [
    {"role": "system", "content": "You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech."},
    {"role": "user", "content": "<image>你现在是一个优秀的汽车座舱智能语音助手，你可以通过输入图片信息做出对应的回答和功能推荐。看下是儿童是否正确佩戴安全带，若没有，请提醒用户。"}
  ],
  "images": ["/data/LLM-SFT/datasets/safetyBelt/correct-wear/0701_00027_ (137).png"]
}

期望输出示例：

检测到儿童已正确佩戴安全带 <|SPEECH_END|> tag|correct-wear|

Python 用法

使用 Transformers 与内置 chat_template：

from transformers import AutoTokenizer, AutoModelForCausalLM
from PIL import Image

repo_id = "XAILab-CyberSpark/CyberSpark2-7b-cabin-sft-v2-gptq-int8"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto",
)

messages = [
    {"role": "system", "content": "You are Qwen, a virtual human developed by the Qwen Team, Alibaba Group, capable of perceiving auditory and visual inputs, as well as generating text and speech."},
    {"role": "user", "content": "<image>你现在是一个优秀的汽车座舱智能语音助手，你的名字是大众，可以通过输入图片信息做出对应的回答和功能推荐。"},
]
query = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
image = Image.open("/path/to/your/image.jpg").convert("RGB")
inputs = tokenizer([query], images=[image], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

文件说明

model-00001/00002.safetensors 与 model.safetensors.index.json：权重与索引。
config.json、generation_config.json：模型与推理配置。
tokenizer.json/tokenizer_config.json/merges.txt/vocab.json：分词器相关。
chat_template.jinja：对话模板，建议通过 apply_chat_template 使用。
preprocessor_config.json、video_preprocessor_config.json：多模态预处理配置。

资源需求

建议使用 16GB 显存以上设备进行推理；支持 INT8 量化加载。

许可与致谢

基于 Qwen2.5-VL-3B-Instruct，请遵循其上游许可与使用约束。
模型仅用于研究与演示目的，集成车机需进行充足的安全评估与验证。

局限与注意事项

图像识别结果可能受光线、遮挡、角度影响。
结构化操作串需与实际车机能力映射，避免越权或误操作。
在真实驾驶环境中使用前，应进行全面测试并设置安全阈值与回退策略。

license: mit datasets: - XAILab-CyberSpark/Cabin-Human-Behavior-Dataset - XAILab-CyberSpark/Cabin-Human-ABNORMAL-Behavior-Dataset language: - zh

Downloads last month: 14

Safetensors

Model size

8B params

Tensor type

F16

I32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support