Instructions to use Archime/parakeet-tdt-0.6b-v3-fr-tv-media with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use Archime/parakeet-tdt-0.6b-v3-fr-tv-media with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("Archime/parakeet-tdt-0.6b-v3-fr-tv-media") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
Request for fine-tuning script/parameters for parakeet-tdt-0.6b-v3-fr-tv-media
Hi @Archime ,
Thank you for sharing this model! I’ve been testing it on French media content, and the performance is impressive.
I am interested in fine-tuning the model further on a niche dataset. Would you be willing to share the training script, configuration files, or the specific hyperparameters (learning rate, scheduler, freezing layers, etc.) used for this version?
Any insights into the preprocessing pipeline for the fr-tv-media data would also be incredibly helpful.
Thanks again for your contribution to the community!
Best regards,
Mahamadi
Hi @madoss ,
Thank you again for your feedback!
As for the training scripts, I used the official framework. You can find everything you need directly in the official repository: https://github.com/NVIDIA-NeMo/NeMo
To help you get started with your niche dataset, here is the exact configuration I used for the parakeet-tdt-0.6b-v3-fr-tv-media fine-tuning :
I kept the encoder frozen (freeze_encoder: True) to retain the base acoustic representations.
name: "parakeet-tdt-0.6b-v3-finetune-french-tv-media"
manifest_dir: ??
init_from_pretrained_model: "nvidia/parakeet-tdt-0.6b-v3"
freeze_encoder: True
freeze_decoder: False
model:
sample_rate: 16000
compute_eval_loss: false # Saves VRAM during validation
log_prediction: true # Logs sample transcriptions to the console
skip_nan_grad: false # Important for debugging BF16 anomalies
train_ds:
manifest_filepath:
- ${manifest_dir}/sdp-ft-media-info/splits/train.json
- ${manifest_dir}/sdp-ft-media-societe/splits/train.json
- ${manifest_dir}/sdp-ft-media-divertissements/splits/train.json
- ${manifest_dir}/sdp-ft-media-documentaires/splits/train.json
- ${manifest_dir}/sdp-ft-media-sports/splits/train.json
sample_rate: ${model.sample_rate}
batch_size: 2
shuffle: true
num_workers: 2 # Can be increased (e.g., to 8) to feed GPUs more efficiently
pin_memory: true
max_duration: 20.0
min_duration: 1.0
is_tarred: false
bucketing_strategy: "fully_randomized"
validation_ds:
manifest_filepath: # To be generated by the dataset split script
- ${manifest_dir}/sdp-ft-media-info/splits/validation.json
- ${manifest_dir}/sdp-ft-media-sports/splits/validation.json
- ${manifest_dir}/sdp-ft-media-societe/splits/validation.json
- ${manifest_dir}/sdp-ft-media-divertissements/splits/validation.json
- ${manifest_dir}/sdp-ft-media-documentaires/splits/validation.json
sample_rate: ${model.sample_rate}
batch_size: 1 # Matches training batch size scale to prevent OOM errors
shuffle: false
num_workers: 2
pin_memory: true
max_duration: 20.0
test_ds:
manifest_filepath:
- ${manifest_dir}/sdp-ft-media-info/splits/test.json
- ${manifest_dir}/sdp-ft-media-societe/splits/test.json
- ${manifest_dir}/sdp-ft-media-divertissements/splits/test.json
- ${manifest_dir}/sdp-ft-media-documentaires/splits/test.json
- ${manifest_dir}/sdp-ft-media-sports/splits/test.json
sample_rate: ${model.sample_rate}
batch_size: 1
shuffle: false
num_workers: 2
pin_memory: true
max_duration: 20.0
# Retaining the original tokenizer since the target language remains French
tokenizer:
update_tokenizer: false
# dir: null
# type: bpe
loss:
loss_name: "tdt"
ctc_loss_weight: 0.3 # Explicit 0.3 weight
tdt_kwargs:
fastemit_lambda: 0.0
clamp: -1.0
durations: # Matches model_defaults
# Data Augmentation (Spectrogram masking)
# Crucial for preventing overfitting on a 55-hour dataset
spec_augment:
_target_: nemo.collections.asr.modules.SpectrogramAugmentation
freq_masks: 2
time_masks: 5 # Slightly reduced to avoid overly degrading the audio signal
freq_width: 27
time_width: 0.05
optim:
name: adamw
lr: 1e-4
betas: [0.9, 0.98]
weight_decay: 1e-3
sched:
name: CosineAnnealing
warmup_steps: 2000 # ~10% of estimated total steps (adjust based on batch size)
min_lr: 5e-6
trainer:
devices: -1 # -1 automatically uses all available GPUs (e.g., 2x RTX 3090)
num_nodes: 1
max_epochs: 40 # 20-30 epochs are often enough to converge without overfitting on 55h of data
val_check_interval: 1.0
accelerator: gpu
strategy:
_target_: lightning.pytorch.strategies.DDPStrategy
find_unused_parameters: false # Optimization for DDP (Multi-GPU)
# Gradient Accumulation
# Effective Batch Size = batch_size (2) * devices (2) * accumulate_grad_batches (16) = 64
accumulate_grad_batches: 16
gradient_clip_val: 1.0 # Prevents gradient explosion
precision: 16-mixed # Uses BF16/FP16. Essential for FastConformer Large on <40GB VRAM GPUs
log_every_n_steps: 10
enable_progress_bar: True
# Must be False so exp_manager handles checkpoints exclusively.
# Otherwise, a CheckpointMisconfigurationError occurs due to conflicting saving mechanisms.
enable_checkpointing: False
# Must be False as exp_manager already handles the logger.
logger: False
exp_manager:
exp_dir: "nemo_parakeet-v3"
name: ${name}
create_tensorboard_logger: true
create_checkpoint_callback: true
checkpoint_callback_params:
monitor: "val_wer" # Monitors the Word Error Rate (WER)
mode: "min"
save_top_k: 3 # Retains the 3 best checkpoints
always_save_nemo: True
resume_if_exists: true
resume_ignore_no_checkpoint: true
Let me know if you need any clarifications on the pipeline. Good luck with your fine-tuning!
Best regards,
Archime
Thank you.
I will start by fine-tuning on your dataset.