Add code snippet for nemo streaming inference

Files changed (1) hide show

README.md CHANGED Viewed

@@ -269,7 +269,28 @@ Latency is defined by the `att_context_size` param, where  att_context_size = `{
 Here, chunk size = current frame + right context; each chunk is processed in non-overlapping fashion.
-You can also use [asr_streaming_inference.py](https://github.com/NVIDIA-NeMo/NeMo/blob/main/examples/asr/asr_streaming_inference/asr_streaming_infer.py) for streaming inference which uses [this config](https://github.com/NVIDIA-NeMo/NeMo/blob/main/examples/asr/conf/asr_streaming_inference/cache_aware_rnnt.yaml) to pipelines to perform end to end inference with PnC, ITN and translation support.
 ### Input

 Here, chunk size = current frame + right context; each chunk is processed in non-overlapping fashion.
+You can also run streaming inference through the pipeline method, which uses [this config](https://github.com/NVIDIA-NeMo/NeMo/blob/main/examples/asr/conf/asr_streaming_inference/cache_aware_rnnt.yaml) to build end‑to‑end workflows with punctuation and capitalization (PnC), inverse text normalization (ITN), and translation support
+```python
+from nemo.collections.asr.inference.factory.pipeline_builder import PipelineBuilder
+from omegaconf import OmegaConf
+# Path to the cache aware config file downloaded from above link
+cfg_path = 'cache_aware_rnnt.yaml'
+cfg = OmegaConf.load(cfg_path)
+# Pass the paths of all the audio files for inferencing
+audios = ['/path/to/your/audio.wav']
+# Create the streaming pipeline inference builder object
+pipeline = PipelineBuilder.build_pipeline(cfg)
+output = pipeline.run(audios)
+# Print the output
+for entry in output:
+  print(entry['text'])
+```
 ### Input