Smaller model planned?

#8
by downtown1629 - opened

Thank you for releasing such a great model! Are there any plans for a smaller size model based on the new cache-aware architecture? I'd appreciate it if you could make a smaller model while maintaining punctuation and capitalization functionality.

NVIDIA org

Thank you for the feedback, @downtown1629 ! We don’t have plans for a smaller Nemotron-Speech-Streaming model at the moment, but this is very helpful input and definitely something we’re interested in exploring. One possible direction could be a ~120M variant with a ~115M FastConformer cache-aware encoder and a single RNN-T layer, along the lines of parakeet_realtime_eou_120m-v1, while preserving punctuation and capitalization. We’ll keep this in mind as we consider future updates.

kunaldhawan changed discussion status to closed

Sign up or log in to comment