Qwen3-VL Models
Collection
Garment classification models based on Qwen3-VL (2B) • 7 items • Updated
NVFP4-quantized version of Qwen3-VL-8B SFT+GRPO for garment attribute classification. Achieves 89.5% weighted score with 12.1 samples/s throughput — only 1.8% degradation vs full precision while being 61% faster.
| Field | SBERT Cosine | NLI Score | SBERT+NLI | Weighted |
|---|---|---|---|---|
| type (2.5x) | 80.0% | 68.9% | 71.6% | 1.79 |
| color | 83.4% | 64.5% | 74.1% | 0.74 |
| pattern | 64.3% | 65.0% | 58.1% | 0.58 |
| closure | 47.4% | 41.1% | 39.9% | 0.40 |
| sleeve | 81.8% | 85.7% | 83.2% | 0.83 |
| neckline | 81.1% | 75.9% | 74.8% | 0.75 |
| defect (2.0x) | 97.1% | 97.1% | 97.0% | 0.98 |
| brand (1.5x) | 95.0% | 94.9% | 94.7% | 0.95 |
| size (1.5x) | 99.4% | 99.3% | 99.3% | 0.99 |
| Overall | 81.1% | 76.9% | 77.0% | 89.5% |
| Variant | Weighted | SBERT+NLI | JSON% | Throughput | Size |
|---|---|---|---|---|---|
| Full precision (bf16) | 91.3% | 78.7% | 100% | 7.5/s | 17.5 GB |
| NVFP4 (this model) | 89.5% | 77.0% | 100% | 12.1/s | 7.0 GB |
python -m vllm.entrypoints.openai.api_server \
--model Denali-AI/qwen3-vl-8b-garment-classifier-nvfp4 \
--quantization modelopt_fp4 \
--max-model-len 4096 \
--trust-remote-code \
--gpu-memory-utilization 0.45
The model extracts 9 structured attributes from garment images:
type, color, pattern, neckline, sleeve_length, closure, brand, size, defect_typeAll benchmarks are run through PeakBench and automatically synced to this model card.
Metric definitions: peakbench_metrics.json
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
model = AutoModelForImageTextToText.from_pretrained(
"Denali-AI/qwen3-vl-8b-garment-classifier-nvfp4",
torch_dtype="bfloat16", device_map="auto", trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(
"Denali-AI/qwen3-vl-8b-garment-classifier-nvfp4", trust_remote_code=True
)
image = Image.open("garment.jpg").convert("RGB")
messages = [{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": "Classify this garment. Return JSON with: type, color, pattern, neckline, sleeve_length, closure, brand, size, defect_type."}
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(processor.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Base model
Qwen/Qwen3-VL-8B-Instruct