--- license: apache-2.0 tags: - vision - design - qwen - fine-tuned - visual-quality - pairwise-comparison base_model: Qwen/Qwen3.5-0.8B pipeline_tag: image-text-to-text --- # Qwen Visual Design Judge A fine-tuned Qwen3.5-0.8B model that judges visual design quality between image pairs. ## 🎯 Performance | Metric | Score | |--------|-------| | Overall Accuracy | **82%** | | High agreement pairs (≥80%) | 90.9% | | Low agreement pairs (<80%) | 79.5% | Matches GPT-4.1 performance while being ~1000x cheaper to run locally! ## 📊 Training - **Base model**: Qwen/Qwen3.5-0.8B - **Training data**: 40K synthetic preference pairs labeled by GPT-4.1 - **Domains**: Landing pages, websites, mobile UI, graphics - **Epochs**: 1 - **Hardware**: NVIDIA T4 GPU (~13 hours) ## 🚀 Usage ```python import torch from transformers import Qwen3_5ForConditionalGeneration, AutoProcessor from qwen_vl_utils import process_vision_info model = Qwen3_5ForConditionalGeneration.from_pretrained( "DillonNys/qwen-visual-design-judge", torch_dtype=torch.bfloat16, device_map="cuda", ) processor = AutoProcessor.from_pretrained("DillonNys/qwen-visual-design-judge") def judge_pair(img_a: str, img_b: str) -> str: prompt = """You are an expert visual design judge. Compare these two images and determine which has better visual design quality. Consider: layout, typography, color harmony, visual hierarchy, spacing, and overall aesthetic appeal. Respond with ONLY "A" or "B" to indicate the better design.""" messages = [{ "role": "user", "content": [ {"type": "text", "text": prompt}, {"type": "text", "text": "\n\nImage A:"}, {"type": "image", "image": img_a}, {"type": "text", "text": "\n\nImage B:"}, {"type": "image", "image": img_b}, {"type": "text", "text": "\n\nWhich is better? Answer A or B:"}, ], }] text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) image_inputs, video_inputs = process_vision_info(messages) inputs = processor(text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt").to("cuda") with torch.no_grad(): output_ids = model.generate(**inputs, max_new_tokens=8, do_sample=False) response = processor.decode(output_ids[0, inputs.input_ids.shape[1]:], skip_special_tokens=True).strip() return "A" if "A" in response.upper() else "B" # Example winner = judge_pair("design_a.png", "design_b.png") print(f"Better design: {winner}") ``` ## 📝 Citation If you use this model, please cite: ```bibtex @misc{qwen-visual-design-judge, author = {Dillon Nys}, title = {Qwen Visual Design Judge}, year = {2026}, publisher = {HuggingFace}, url = {https://huggingface.co/DillonNys/qwen-visual-design-judge} } ``` ## 🙏 Acknowledgments - Qwen team for the excellent base model - OpenAI for GPT-4.1 used in synthetic labeling - The Vibe Arena community for preference data