Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,8 +1,8 @@
|
|
| 1 |
---
|
| 2 |
title: Dinosaur Project
|
| 3 |
-
emoji:
|
| 4 |
colorFrom: green
|
| 5 |
-
colorTo:
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 5.44.1
|
| 8 |
app_file: app.py
|
|
@@ -11,4 +11,50 @@ license: apache-2.0
|
|
| 11 |
short_description: A simple dinosaur species classifier, fine-tuned on ConvNext
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
title: Dinosaur Project
|
| 3 |
+
emoji: 👀
|
| 4 |
colorFrom: green
|
| 5 |
+
colorTo: pink
|
| 6 |
sdk: gradio
|
| 7 |
sdk_version: 5.44.1
|
| 8 |
app_file: app.py
|
|
|
|
| 11 |
short_description: A simple dinosaur species classifier, fine-tuned on ConvNext
|
| 12 |
---
|
| 13 |
|
| 14 |
+
<p align="center">
|
| 15 |
+
<h1 align="center">Jurassic Park Dinosaur Species Classifier</h1>
|
| 16 |
+
</p>
|
| 17 |
+
|
| 18 |
+
## Introduction
|
| 19 |
+
This Huggingface Space contains source code for my project of building a classification model that can distinguish dinosaur species that appears in the Jurassic Park franchise.
|
| 20 |
+
The model is then used to create a simple web app (using Gradio). With my code, you can:
|
| 21 |
+
* Upload an image of a dinosaur and get prediction on its species (top-5 predictions will be shown, along with their probabilities).
|
| 22 |
+
* Train the model again with different configuration (see `src/train.py` script).
|
| 23 |
+
* Use this as a baseline model and develop your own model.
|
| 24 |
+
|
| 25 |
+
## About the data
|
| 26 |
+
The original dataset can be found on kaggle https://www.kaggle.com/datasets/antaresl/jurassic-park-dinosaurs-dataset.
|
| 27 |
+
However, this dataset is crawled from web, so it's very diverse in style, size, quality, ... and of course, there is a lot of noise in the data
|
| 28 |
+
(some species which looks very close to each other are labeled incorrectly, some images are children's drawing or fossil photograph, ...)
|
| 29 |
+
So I wrote the `src/filter_and_split.py` script to:
|
| 30 |
+
* Filter out children's drawing or fossil photograph (most of them) - which I believe, negatively impact the ability of the model to learn features the most.
|
| 31 |
+
* Split the dataset into train/validation/test data with ratio of 0.8/0.1/0/1 respectively
|
| 32 |
+
|
| 33 |
+
I use a CLIP model to quickly filter out invalid images, and I create a custom dataset class, which can be found in `src/dataset.py` to utilize batch processing and speed up the filtering process.
|
| 34 |
+
But it is my fault that I did not set a seed for random in the script, which reduce the reproducibility of the script.
|
| 35 |
+
|
| 36 |
+
## Training
|
| 37 |
+
After trying on different models like ResNet, EfficientNet, ..., I find out that ConvNext Tiny yeilds the best result, and also the most stable model. The model was trained on GPU P100 of Kaggle.
|
| 38 |
+
The `src/train.py` script is used to train the model. Detailed process of exploring data, set up for training, examining train process and evaluate model on test data
|
| 39 |
+
can be found on `dinosaur_species_classification.ipynb` notebook.
|
| 40 |
+
|
| 41 |
+
## Trained model
|
| 42 |
+
You can find my final model for this project in `model` directory.
|
| 43 |
+
|
| 44 |
+
## Final results
|
| 45 |
+
The model achived an accuracy score of 0.75 and a weighted F1 score of roughly 0.76, which to me is not too shabby, considering the quality of data and limited resources that we have.
|
| 46 |
+
You can see the confusion matrix and a table of top-10 misclassified pairs at the end of the `dinosaur_species_classification.ipynb` notebook to have a closer look on the model's performance.
|
| 47 |
+
|
| 48 |
+
## Suggestions for improvement
|
| 49 |
+
There are several ideas that I did not try, but you could in order to improve this model:
|
| 50 |
+
* Neural Style Transfer: Our data consists of a variety of styles (2D hand-drawing, 2D digital drawing, 3D models, ...), so our model might be prone to learning images's style, not biological features.
|
| 51 |
+
In order to address this, you can use a Neural Style Transfer model to generate stylized version of original dataset, and train the model on this new dataset to see if there is any improvement.
|
| 52 |
+
* Bilinear Pooling: This is a very interesting method, since it uses two CNN backbones to generate two feature maps, then calculate the outer product of them to create a new feature map that
|
| 53 |
+
contain information on how original features interact. This has proved to be efficient in fine-grained classification task like our project, where some classes have very small difference.
|
| 54 |
+
However, this method also comes with a cost of computation, so you should do some research on Compact Bilinear Pooling - a method in which we estimate the outer product of feature maps instead of
|
| 55 |
+
calculating it.
|
| 56 |
+
* Synthetic Data Generation: Our data is relatively small, only ~70 images per class, so you should consider using Gen-AI models to generate new samples.
|
| 57 |
+
However, do note that you will need to generate samples in a way that preserve features of each class. After having a larger dataset, you can consider unfreezing more layers of ConvNext Tiny
|
| 58 |
+
to fine-tune, or using Vision Transformers models.
|
| 59 |
+
|
| 60 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|