You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

qwen3-vl-8b-garment-classifier-nvfp4

NVFP4-quantized version of Qwen3-VL-8B SFT+GRPO for garment attribute classification. Achieves 89.5% weighted score with 12.1 samples/s throughput — only 1.8% degradation vs full precision while being 61% faster.

Training & Quantization

  • Base model: Qwen/Qwen3-VL-8B-Instruct
  • Fine-tuning: SFT (Stage 1) + GRPO (Stage 2) with LoRA (r=16, alpha=32)
  • Quantization: NVFP4 via NVIDIA ModelOpt with 512-sample calibration
  • Excluded from quantization: lm_head, visual encoder

Benchmark: 3.5k Hard Eval

Field SBERT Cosine NLI Score SBERT+NLI Weighted
type (2.5x) 80.0% 68.9% 71.6% 1.79
color 83.4% 64.5% 74.1% 0.74
pattern 64.3% 65.0% 58.1% 0.58
closure 47.4% 41.1% 39.9% 0.40
sleeve 81.8% 85.7% 83.2% 0.83
neckline 81.1% 75.9% 74.8% 0.75
defect (2.0x) 97.1% 97.1% 97.0% 0.98
brand (1.5x) 95.0% 94.9% 94.7% 0.95
size (1.5x) 99.4% 99.3% 99.3% 0.99
Overall 81.1% 76.9% 77.0% 89.5%

Full Precision vs NVFP4

Variant Weighted SBERT+NLI JSON% Throughput Size
Full precision (bf16) 91.3% 78.7% 100% 7.5/s 17.5 GB
NVFP4 (this model) 89.5% 77.0% 100% 12.1/s 7.0 GB
  • Accuracy loss: -1.8% weighted
  • Throughput gain: +61%
  • Size reduction: 60% (17.5 GB → 7.0 GB)

Serving with vLLM

python -m vllm.entrypoints.openai.api_server \
  --model Denali-AI/qwen3-vl-8b-garment-classifier-nvfp4 \
  --quantization modelopt_fp4 \
  --max-model-len 4096 \
  --trust-remote-code \
  --gpu-memory-utilization 0.45

Target Fields

The model extracts 9 structured attributes from garment images:

  • type, color, pattern, neckline, sleeve_length, closure, brand, size, defect_type

Evaluation Platform

PeakBench Metrics

All benchmarks are run through PeakBench and automatically synced to this model card. Metric definitions: peakbench_metrics.json

Usage

from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image

model = AutoModelForImageTextToText.from_pretrained(
    "Denali-AI/qwen3-vl-8b-garment-classifier-nvfp4",
    torch_dtype="bfloat16", device_map="auto", trust_remote_code=True
)
processor = AutoProcessor.from_pretrained(
    "Denali-AI/qwen3-vl-8b-garment-classifier-nvfp4", trust_remote_code=True
)

image = Image.open("garment.jpg").convert("RGB")
messages = [{"role": "user", "content": [
    {"type": "image"},
    {"type": "text", "text": "Classify this garment. Return JSON with: type, color, pattern, neckline, sleeve_length, closure, brand, size, defect_type."}
]}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=256, do_sample=False)
print(processor.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Downloads last month
7
Safetensors
Model size
5B params
Tensor type
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Denali-AI/qwen3-vl-8b-garment-classifier-nvfp4

Quantized
(75)
this model

Collection including Denali-AI/qwen3-vl-8b-garment-classifier-nvfp4