nvidia/nemotron-speech-streaming-en-0.6b

Smaller model planned?

by downtown1629 - opened Jan 18

•

Thank you for releasing such a great model! Are there any plans for a smaller size model based on the new cache-aware architecture? I'd appreciate it if you could make a smaller model while maintaining punctuation and capitalization functionality.

kunaldhawan

NVIDIA org Jan 23

Thank you for the feedback, @downtown1629 ! We don’t have plans for a smaller Nemotron-Speech-Streaming model at the moment, but this is very helpful input and definitely something we’re interested in exploring. One possible direction could be a ~120M variant with a ~115M FastConformer cache-aware encoder and a single RNN-T layer, along the lines of parakeet_realtime_eou_120m-v1, while preserving punctuation and capitalization. We’ll keep this in mind as we consider future updates.

kunaldhawan changed discussion status to closed Jan 23

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment