ImagineClassification (fine-tuned ViT)

Fine-tuned Vision Transformer (ViT-B/16, patch 16, 224×224) for coarse fashion product classification into four masterCategory labels from the Fashion Product Images (small) dataset.

Model summary

Item Detail
Base checkpoint google/vit-base-patch16-224-in21k
Task Multi-class image classification (4 classes)
Labels Apparel, Accessories, Footwear, Personal Care
Input RGB images, 224×224 (use the bundled ViTImageProcessor / AutoImageProcessor)
Framework PyTorch + Transformers

This repository was produced by comparing three candidate image classifiers (same data and training recipe), then packaging the best checkpoint by test accuracy.

Training procedure (from experiment notebook)

Training and evaluation follow the pipeline described in Pipeline_1_fine_tuning_models.ipynb:

  • Data: ashraq/fashion-product-images-small (train split), rows with masterCategory in the four classes above.
  • Balanced sampling: For each class, 2,100 images sampled with random_state=5 (SEED = 5):
    • 100 images per class held out as a stratified out-of-sample test set (400 images total).
    • Remaining 2,000 per class form the train/val pool; stratified train/validation split (same seed).
  • Optimization: AdamW, learning rate 2e-5, batch size 8, 1 fine-tuning epoch, cross-entropy loss.
  • Candidates fine-tuned (same recipe):
    google/vit-base-patch16-224-in21k, facebook/deit-tiny-patch16-224, google/mobilenet_v2_1.0_224.
  • Selection rule: Highest test accuracy on the held-out 400-sample test set; ties broken by the loop order in the notebook.

Reported results (one Colab run, post fine-tuning)

On the 400-image balanced test set; runtime = mean seconds per image for evaluation in that run (device-dependent).

Model Test accuracy Runtime (s / image)
google/vit-base-patch16-224-in21k 0.9975 0.000953
facebook/deit-tiny-patch16-224 0.9950 0.000886
google/mobilenet_v2_1.0_224 0.9375 0.001365

Selected model: google/vit-base-patch16-224-in21k (accuracy 1.0000 on this test split).

Note: Perfect accuracy on 400 samples does not guarantee generalization to all real-world product photos. Performance depends on image quality, viewpoint, and domain shift relative to the dataset.

Intended use

  • Primary: Fast coarse category tagging for fashion e-commerce assets (four-way classification).
  • Out of scope: Fine-grained SKU/subcategory prediction, non-fashion images, or classes outside the four labels above.

Limitations and bias

  • Trained only on four frequent masterCategory values for a class-balanced setup; other categories from the original catalog are not represented.
  • The source dataset may reflect commercial catalog biases (presentation, demographics, geography).
  • Do not use for high-stakes decisions (e.g., safety, compliance, or financial outcomes) without further validation.

How to use

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import requests

model_id = "Leoinhouse/ImagineClassification-finetuned-model"  # or local path
processor = AutoImageProcessor.from_pretrained(model_id)
model = AutoModelForImageClassification.from_pretrained(model_id)

image = Image.open(requests.get("https://example.com/product.jpg", stream=True).raw).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
outputs = model(**inputs)
predicted_id = outputs.logits.argmax(-1).item()
label = model.config.id2label[predicted_id]
print(label)

Citation

If you use this model, cite the Transformers library and the dataset you rely on, for example:

@inproceedings{wolf-etal-2020-transformers,
  title={Transformers: State-of-the-Art Natural Language Processing},
  author={Wolf, Thomas and others},
  booktitle={EMNLP 2020: System Demonstrations},
  year={2020}
}

Model card contact

Maintained for coursework / project use under Hugging Face user Leoinhouse. For the upstream ViT architecture and weights, see the base model card: google/vit-base-patch16-224-in21k.

Downloads last month
36
Safetensors
Model size
85.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Leoinhouse/ImagineClassification-finetuned-model

Finetuned
(2512)
this model

Dataset used to train Leoinhouse/ImagineClassification-finetuned-model