qwen3-asr-1.7b-ov-int8

OpenVINO IR export of Qwen/Qwen3-ASR-1.7B with INT8 weight compression (NNCF), for Intel CPU/iGPU inference.

Exported via the official OpenVINO notebooks qwen3-asr helper: four thinker sub-models — embedding / audio (conv frontend) / audio_encoder / language_model (stateful KV-cache decoder; also convertible to PagedAttention at load time via SDPAToPagedAttention).
Tokenizer/processor files included (vocab.json, merges.txt, added_tokens.json, tokenizer.json, config.json, preprocessor_config.json, chat_template.jinja).
Measured on an Intel Arc 130V iGPU with a Korean call-center test set: micro-CER 5.71% with a contextual-biasing system prompt; single-utterance RTF ~0.17 (warm) with a paged decoder and persistent prompt-prefix KV.

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jinhwan000/qwen3-asr-1.7b-ov-int8

Base model

Finetuned

(67)

this model