Spaces:

lucvantien1211
/

dinosaur_project

Sleeping

App Files Files Community

lucvantien1211 commited on Aug 30, 2025

Commit

5e4221e

verified ·

1 Parent(s): d0183e5

Update README.md

Browse files

Files changed (1) hide show

README.md +49 -3

README.md CHANGED Viewed

@@ -1,8 +1,8 @@
 ---
 title: Dinosaur Project
-emoji: 🏃
 colorFrom: green
-colorTo: red
 sdk: gradio
 sdk_version: 5.44.1
 app_file: app.py
@@ -11,4 +11,50 @@ license: apache-2.0
 short_description: A simple dinosaur species classifier, fine-tuned on ConvNext
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Dinosaur Project
+emoji: 👀
 colorFrom: green
+colorTo: pink
 sdk: gradio
 sdk_version: 5.44.1
 app_file: app.py
 short_description: A simple dinosaur species classifier, fine-tuned on ConvNext
 ---
+<p align="center">
+ <h1 align="center">Jurassic Park Dinosaur Species Classifier</h1>
+</p>
+## Introduction
+This Huggingface Space contains source code for my project of building a classification model that can distinguish dinosaur species that appears in the Jurassic Park franchise.
+The model is then used to create a simple web app (using Gradio). With my code, you can:
+* Upload an image of a dinosaur and get prediction on its species (top-5 predictions will be shown, along with their probabilities).
+* Train the model again with different configuration (see `src/train.py` script).
+* Use this as a baseline model and develop your own model.
+## About the data
+The original dataset can be found on kaggle https://www.kaggle.com/datasets/antaresl/jurassic-park-dinosaurs-dataset.
+However, this dataset is crawled from web, so it's very diverse in style, size, quality, ... and of course, there is a lot of noise in the data
+(some species which looks very close to each other are labeled incorrectly, some images are children's drawing or fossil photograph, ...)
+So I wrote the `src/filter_and_split.py` script to:
+* Filter out children's drawing or fossil photograph (most of them) - which I believe, negatively impact the ability of the model to learn features the most.
+* Split the dataset into train/validation/test data with ratio of 0.8/0.1/0/1 respectively
+I use a CLIP model to quickly filter out invalid images, and I create a custom dataset class, which can be found in `src/dataset.py` to utilize batch processing and speed up the filtering process.
+But it is my fault that I did not set a seed for random in the script, which reduce the reproducibility of the script.
+## Training
+After trying on different models like ResNet, EfficientNet, ..., I find out that ConvNext Tiny yeilds the best result, and also the most stable model. The model was trained on GPU P100 of Kaggle.
+The `src/train.py` script is used to train the model. Detailed process of exploring data, set up for training, examining train process and evaluate model on test data
+can be found on `dinosaur_species_classification.ipynb` notebook.
+## Trained model
+You can find my final model for this project in `model` directory.
+## Final results
+The model achived an accuracy score of 0.75 and a weighted F1 score of roughly 0.76, which to me is not too shabby, considering the quality of data and limited resources that we have.
+You can see the confusion matrix and a table of top-10 misclassified pairs at the end of the `dinosaur_species_classification.ipynb` notebook to have a closer look on the model's performance.
+## Suggestions for improvement
+There are several ideas that I did not try, but you could in order to improve this model:
+* Neural Style Transfer: Our data consists of a variety of styles (2D hand-drawing, 2D digital drawing, 3D models, ...), so our model might be prone to learning images's style, not biological features.
+In order to address this, you can use a Neural Style Transfer model to generate stylized version of original dataset, and train the model on this new dataset to see if there is any improvement.
+* Bilinear Pooling: This is a very interesting method, since it uses two CNN backbones to generate two feature maps, then calculate the outer product of them to create a new feature map that
+contain information on how original features interact. This has proved to be efficient in fine-grained classification task like our project, where some classes have very small difference.
+However, this method also comes with a cost of computation, so you should do some research on Compact Bilinear Pooling - a method in which we estimate the outer product of feature maps instead of
+calculating it.
+* Synthetic Data Generation: Our data is relatively small, only ~70 images per class, so you should consider using Gen-AI models to generate new samples.
+However, do note that you will need to generate samples in a way that preserve features of each class. After having a larger dataset, you can consider unfreezing more layers of ConvNext Tiny
+to fine-tune, or using Vision Transformers models.
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference