You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

qwen3-vl-8b-garment-classifier-nvfp4

NVFP4-quantized version of Qwen3-VL-8B SFT+GRPO for garment attribute classification. Achieves 89.5% weighted score with 12.1 samples/s throughput — only 1.8% degradation vs full precision while being 61% faster.

Training & Quantization

Base model: Qwen/Qwen3-VL-8B-Instruct
Fine-tuning: SFT (Stage 1) + GRPO (Stage 2) with LoRA (r=16, alpha=32)
Quantization: NVFP4 via NVIDIA ModelOpt with 512-sample calibration
Excluded from quantization: lm_head, visual encoder

Benchmark: 3.5k Hard Eval

Field	SBERT Cosine	NLI Score	SBERT+NLI	Weighted
type (2.5x)	80.0%	68.9%	71.6%	1.79
color	83.4%	64.5%	74.1%	0.74
pattern	64.3%	65.0%	58.1%	0.58
closure	47.4%	41.1%	39.9%	0.40
sleeve	81.8%	85.7%	83.2%	0.83
neckline	81.1%	75.9%	74.8%	0.75
defect (2.0x)	97.1%	97.1%	97.0%	0.98
brand (1.5x)	95.0%	94.9%	94.7%	0.95
size (1.5x)	99.4%	99.3%	99.3%	0.99
Overall	81.1%	76.9%	77.0%	89.5%

Full Precision vs NVFP4

Variant	Weighted	SBERT+NLI	JSON%	Throughput	Size
Full precision (bf16)	91.3%	78.7%	100%	7.5/s	17.5 GB
NVFP4 (this model)	89.5%	77.0%	100%	12.1/s	7.0 GB

Accuracy loss: -1.8% weighted
Throughput gain: +61%
Size reduction: 60% (17.5 GB → 7.0 GB)

Serving with vLLM

python -m vllm.entrypoints.openai.api_server \
  --model Denali-AI/qwen3-vl-8b-garment-classifier-nvfp4 \
  --quantization modelopt_fp4 \
  --max-model-len 4096 \
  --trust-remote-code \
  --gpu-memory-utilization 0.45

Target Fields

The model extracts 9 structured attributes from garment images:

type, color, pattern, neckline, sleeve_length, closure, brand, size, defect_type

Evaluation Platform

All benchmarks are run through PeakBench and automatically synced to this model card. Metric definitions: peakbench_metrics.json

Usage

from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image

model = AutoModelForImageTextToText.from_pretrained(
    "Denali-AI/qwen3-vl-8b-garment-classifier-nvfp4",
    torch_dtype="bfloat16", device_map="auto", trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(
    "Denali-AI/qwen3-vl-8b-garment-classifier-nvfp4", trust_remote_code=True
)

image = Image.open("garment.jpg").convert("RGB")
messages = [{"role": "user", "content": [
    {"type": "image"},
    {"type": "text", "text": "Classify this garment. Return JSON with: type, color, pattern, neckline, sleeve_length, closure, brand, size, defect_type."}
]}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(processor.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))