ImagineClassification (fine-tuned ViT)
Fine-tuned Vision Transformer (ViT-B/16, patch 16, 224×224) for coarse fashion product classification into four masterCategory labels from the Fashion Product Images (small) dataset.
Model summary
| Item | Detail |
|---|---|
| Base checkpoint | google/vit-base-patch16-224-in21k |
| Task | Multi-class image classification (4 classes) |
| Labels | Apparel, Accessories, Footwear, Personal Care |
| Input | RGB images, 224×224 (use the bundled ViTImageProcessor / AutoImageProcessor) |
| Framework | PyTorch + Transformers |
This repository was produced by comparing three candidate image classifiers (same data and training recipe), then packaging the best checkpoint by test accuracy.
Training procedure (from experiment notebook)
Training and evaluation follow the pipeline described in Pipeline_1_fine_tuning_models.ipynb:
- Data:
ashraq/fashion-product-images-small(trainsplit), rows withmasterCategoryin the four classes above. - Balanced sampling: For each class, 2,100 images sampled with
random_state=5(SEED = 5):- 100 images per class held out as a stratified out-of-sample test set (400 images total).
- Remaining 2,000 per class form the train/val pool; stratified train/validation split (same seed).
- Optimization: AdamW, learning rate 2e-5, batch size 8, 1 fine-tuning epoch, cross-entropy loss.
- Candidates fine-tuned (same recipe):
google/vit-base-patch16-224-in21k,facebook/deit-tiny-patch16-224,google/mobilenet_v2_1.0_224. - Selection rule: Highest test accuracy on the held-out 400-sample test set; ties broken by the loop order in the notebook.
Reported results (one Colab run, post fine-tuning)
On the 400-image balanced test set; runtime = mean seconds per image for evaluation in that run (device-dependent).
| Model | Test accuracy | Runtime (s / image) |
|---|---|---|
google/vit-base-patch16-224-in21k |
0.9975 | 0.000953 |
facebook/deit-tiny-patch16-224 |
0.9950 | 0.000886 |
google/mobilenet_v2_1.0_224 |
0.9375 | 0.001365 |
Selected model: google/vit-base-patch16-224-in21k (accuracy 1.0000 on this test split).
Note: Perfect accuracy on 400 samples does not guarantee generalization to all real-world product photos. Performance depends on image quality, viewpoint, and domain shift relative to the dataset.
Intended use
- Primary: Fast coarse category tagging for fashion e-commerce assets (four-way classification).
- Out of scope: Fine-grained SKU/subcategory prediction, non-fashion images, or classes outside the four labels above.
Limitations and bias
- Trained only on four frequent
masterCategoryvalues for a class-balanced setup; other categories from the original catalog are not represented. - The source dataset may reflect commercial catalog biases (presentation, demographics, geography).
- Do not use for high-stakes decisions (e.g., safety, compliance, or financial outcomes) without further validation.
How to use
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import requests
model_id = "Leoinhouse/ImagineClassification-finetuned-model" # or local path
processor = AutoImageProcessor.from_pretrained(model_id)
model = AutoModelForImageClassification.from_pretrained(model_id)
image = Image.open(requests.get("https://example.com/product.jpg", stream=True).raw).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
predicted_id = outputs.logits.argmax(-1).item()
label = model.config.id2label[predicted_id]
print(label)
Citation
If you use this model, cite the Transformers library and the dataset you rely on, for example:
@inproceedings{wolf-etal-2020-transformers,
title={Transformers: State-of-the-Art Natural Language Processing},
author={Wolf, Thomas and others},
booktitle={EMNLP 2020: System Demonstrations},
year={2020}
}
Model card contact
Maintained for coursework / project use under Hugging Face user Leoinhouse. For the upstream ViT architecture and weights, see the base model card: google/vit-base-patch16-224-in21k.
- Downloads last month
- 36
Model tree for Leoinhouse/ImagineClassification-finetuned-model
Base model
google/vit-base-patch16-224-in21k