π¬ UCF101 Video Action Classifier
A video action recognition app built with ResNet18 + 2-layer LSTM, trained incrementally on the UCF101 dataset using class-wise replay memory.
π§ Model Architecture
| Component | Details |
|---|---|
| Backbone | ResNet18 (layer3 + layer4 unfrozen) |
| Temporal | 2-layer LSTM (hidden=256, dropout=0.3) |
| Head | BatchNorm1d β Dropout(0.5) β Linear(256 β 101) |
| Input | 16 uniformly-sampled frames @ 224Γ224 |
| Output | Softmax over trained classes |
ποΈ Training Strategy
The model is trained incrementally, 5 classes at a time, using replay memory to avoid catastrophic forgetting:
- Group 0 β classes 0β4 (ApplyEyeMakeup, ApplyLipstick, Archery, BabyCrawling, BalanceBeam)
- Group 1 β classes 5β9 (BandMarching, BaseballPitch, Basketball, BasketballDunk, BenchPress)
- (More groups can be added by retraining and updating
config.json)
π’ Currently Known Classes (10 / 101)
| Index | Class |
|---|---|
| 0 | ApplyEyeMakeup |
| 1 | ApplyLipstick |
| 2 | Archery |
| 3 | BabyCrawling |
| 4 | BalanceBeam |
| 5 | BandMarching |
| 6 | BaseballPitch |
| 7 | Basketball |
| 8 | BasketballDunk |
| 9 | BenchPress |
β οΈ If your video shows an action not in this list, the prediction will be unreliable.
π§ Prediction Pipeline
- Smart Preprocessing β long videos are trimmed to the most motion-active 5-second segment; aspect-ratio crop is applied to remove black bars
- Multi-clip Voting β N random temporal clips are sampled and averaged for robust predictions
- Masked Softmax β only logits for trained classes compete; unseen classes are masked out
π Files
| File | Purpose |
|---|---|
app.py |
Gradio app β inference + UI |
config.json |
Model config, class list, trained groups |
requirements.txt |
Python dependencies |
model_v2_groups0to1.pth |
Trained model weights |
π Updating the Model
When you train more groups in Colab and get a new .pth file:
- Upload your new
.pthto this repo (replace the old one or use a new filename) - Edit
config.json:- Update
"model_path"if you renamed the file - Add the new group index to
"trained_groups"(e.g.[0, 1, 2]after training group 2)
- Update
- Restart the Space β it will auto-reload
Example config.json update after training groups 0β3 (classes 0β19):
{
"model_path": "model_v2_groups0to3.pth",
"trained_groups": [0, 1, 2, 3]
}
π‘ Tips for Best Results
- Upload short videos (2β10 seconds) with a single, clear action
- Use
.mp4format for best compatibility - Increase the number of clips slider for more reliable predictions on ambiguous videos
- The confidence score indicates how certain the model is:
- π’ β₯ 50% β High confidence
- π‘ 30β50% β Medium confidence
- π΄ < 30% β Low confidence (action may not be in trained classes)
- Downloads last month
- 91
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support