🎬 UCF101 Video Action Classifier

A video action recognition app built with ResNet18 + 2-layer LSTM, trained incrementally on the UCF101 dataset using class-wise replay memory.


🧠 Model Architecture

Component Details
Backbone ResNet18 (layer3 + layer4 unfrozen)
Temporal 2-layer LSTM (hidden=256, dropout=0.3)
Head BatchNorm1d β†’ Dropout(0.5) β†’ Linear(256 β†’ 101)
Input 16 uniformly-sampled frames @ 224Γ—224
Output Softmax over trained classes

πŸ‹οΈ Training Strategy

The model is trained incrementally, 5 classes at a time, using replay memory to avoid catastrophic forgetting:

  • Group 0 β†’ classes 0–4 (ApplyEyeMakeup, ApplyLipstick, Archery, BabyCrawling, BalanceBeam)
  • Group 1 β†’ classes 5–9 (BandMarching, BaseballPitch, Basketball, BasketballDunk, BenchPress)
  • (More groups can be added by retraining and updating config.json)

🟒 Currently Known Classes (10 / 101)

Index Class
0 ApplyEyeMakeup
1 ApplyLipstick
2 Archery
3 BabyCrawling
4 BalanceBeam
5 BandMarching
6 BaseballPitch
7 Basketball
8 BasketballDunk
9 BenchPress

⚠️ If your video shows an action not in this list, the prediction will be unreliable.


πŸ”§ Prediction Pipeline

  1. Smart Preprocessing β€” long videos are trimmed to the most motion-active 5-second segment; aspect-ratio crop is applied to remove black bars
  2. Multi-clip Voting β€” N random temporal clips are sampled and averaged for robust predictions
  3. Masked Softmax β€” only logits for trained classes compete; unseen classes are masked out

πŸ“‚ Files

File Purpose
app.py Gradio app β€” inference + UI
config.json Model config, class list, trained groups
requirements.txt Python dependencies
model_v2_groups0to1.pth Trained model weights

πŸ”„ Updating the Model

When you train more groups in Colab and get a new .pth file:

  1. Upload your new .pth to this repo (replace the old one or use a new filename)
  2. Edit config.json:
    • Update "model_path" if you renamed the file
    • Add the new group index to "trained_groups" (e.g. [0, 1, 2] after training group 2)
  3. Restart the Space β€” it will auto-reload

Example config.json update after training groups 0–3 (classes 0–19):

{
  "model_path": "model_v2_groups0to3.pth",
  "trained_groups": [0, 1, 2, 3]
}

πŸ’‘ Tips for Best Results

  • Upload short videos (2–10 seconds) with a single, clear action
  • Use .mp4 format for best compatibility
  • Increase the number of clips slider for more reliable predictions on ambiguous videos
  • The confidence score indicates how certain the model is:
    • 🟒 β‰₯ 50% β€” High confidence
    • 🟑 30–50% β€” Medium confidence
    • πŸ”΄ < 30% β€” Low confidence (action may not be in trained classes)
Downloads last month
91
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using samiran3474/Smart_Video_Analytics 1