🎬 UCF101 Video Action Classifier

A video action recognition app built with ResNet18 + 2-layer LSTM, trained incrementally on the UCF101 dataset using class-wise replay memory.

🧠 Model Architecture

Component	Details
Backbone	ResNet18 (layer3 + layer4 unfrozen)
Temporal	2-layer LSTM (hidden=256, dropout=0.3)
Head	BatchNorm1d → Dropout(0.5) → Linear(256 → 101)
Input	16 uniformly-sampled frames @ 224×224
Output	Softmax over trained classes

The model is trained incrementally, 5 classes at a time, using replay memory to avoid catastrophic forgetting:

Group 0 → classes 0–4 (ApplyEyeMakeup, ApplyLipstick, Archery, BabyCrawling, BalanceBeam)
Group 1 → classes 5–9 (BandMarching, BaseballPitch, Basketball, BasketballDunk, BenchPress)
(More groups can be added by retraining and updating config.json)

⚠️ If your video shows an action not in this list, the prediction will be unreliable.

Smart Preprocessing — long videos are trimmed to the most motion-active 5-second segment; aspect-ratio crop is applied to remove black bars
Multi-clip Voting — N random temporal clips are sampled and averaged for robust predictions
Masked Softmax — only logits for trained classes compete; unseen classes are masked out

File	Purpose
`app.py`	Gradio app — inference + UI
`config.json`	Model config, class list, trained groups
`requirements.txt`	Python dependencies
`model_v2_groups0to1.pth`	Trained model weights

When you train more groups in Colab and get a new .pth file:

Upload your new .pth to this repo (replace the old one or use a new filename)
Edit config.json:
- Update "model_path" if you renamed the file
- Add the new group index to "trained_groups" (e.g. [0, 1, 2] after training group 2)
Restart the Space — it will auto-reload

Example config.json update after training groups 0–3 (classes 0–19):

{
  "model_path": "model_v2_groups0to3.pth",
  "trained_groups": [0, 1, 2, 3]
}

Upload short videos (2–10 seconds) with a single, clear action
Use .mp4 format for best compatibility
Increase the number of clips slider for more reliable predictions on ambiguous videos
The confidence score indicates how certain the model is:
- 🟢 ≥ 50% — High confidence
- 🟡 30–50% — Medium confidence
- 🔴 < 30% — Low confidence (action may not be in trained classes)

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support