XLSR models fine-tuned on 5 hours of North American English dialects with various augmentation strategies. Tiny vocab of 40 most common sounds.