vlbthambawita 's Collections Transformer-based Models for Computer Vision
updated
MIO: A Foundation Model on Multimodal Tokens
Paper
• 2409.17692
• Published • 53
An Image is Worth 16x16 Words: Transformers for Image Recognition at
Scale
Paper
• 2010.11929
• Published • 15
Going deeper with Image Transformers
Paper
• 2103.17239
• Published
Training data-efficient image transformers & distillation through
attention
Paper
• 2012.12877
• Published • 2
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Paper
• 2103.14030
• Published • 5
Masked Autoencoders Are Scalable Vision Learners
Paper
• 2111.06377
• Published • 6
DINOv2: Learning Robust Visual Features without Supervision
Paper
• 2304.07193
• Published • 9
Emerging Properties in Self-Supervised Vision Transformers
Paper
• 2104.14294
• Published • 4
BEiT: BERT Pre-Training of Image Transformers
Paper
• 2106.08254
• Published • 2
Learning Transferable Visual Models From Natural Language Supervision
Paper
• 2103.00020
• Published • 21
How to train your ViT? Data, Augmentation, and Regularization in Vision
Transformers
Paper
• 2106.10270
• Published • 3
Biomedical SAM 2: Segment Anything in Biomedical Images and Videos
Paper
• 2408.03286
• Published
SAM 2: Segment Anything in Images and Videos
Paper
• 2408.00714
• Published • 122