You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

NbAiLab / nb-asr-beta-qwen06b-lunde03-verbatim

Norwegian ASR checkpoint for the NB-ASR beta program

This repository contains an NB-ASR beta checkpoint based on Qwen3-ASR-0.6B, adapted by NbAiLab for Norwegian speech recognition evaluation and deployment testing.

Internal reference: lunde03-verbatim

Uploaded: 31.03.2026

The immediate purpose of this release is to support:

reproducible beta evaluation,
loading and inference validation in realistic environments,
and packaging of a reviewed checkpoint for Hugging Face distribution.

Confidential beta release: this model card and the associated weights are intended for approved evaluators and collaborators. Treat the checkpoint as beta material rather than a public production release.

Provenance

This HF repo was prepared from the local training artifact:

Qwen3-ASR-0.6B-lunde03_verbatim/checkpoint-50000

The packaging step selected the last checkpoint from that training run and copied the files required for inference and model loading into this staged Hugging Face repository.

Overview

This model is part of the NB-ASR beta group and is intended for technical evaluation, integration testing, and model-card maintenance in the Hugging Face workflow. It is suitable for:

local transcription experiments,
batch inference,
serving tests,
and end-to-end evaluation through the project's standard scripts.

Because this is a beta checkpoint, recognition behavior, formatting, and runtime characteristics may still change. Current results should be treated as provisional.

Recommended Usage

The preferred interface is the official qwen-asr package, which exposes both a standard transformers backend and a vLLM-backed serving path.

Install the base package

pip install -U qwen-asr

Install the vLLM extras

pip install -U "qwen-asr[vllm]"

Optional FlashAttention 2

pip install -U flash-attn --no-build-isolation

For lower-memory build environments:

MAX_JOBS=4 pip install -U flash-attn --no-build-isolation

Quick Start: Transformers Backend

import torch
from qwen_asr import Qwen3ASRModel

model = Qwen3ASRModel.from_pretrained(
    "NbAiLab/nb-asr-beta-qwen06b-lunde03-verbatim",
    dtype=torch.bfloat16,
    device_map="cuda:0",
    # attn_implementation="flash_attention_2",
    max_inference_batch_size=32,
    max_new_tokens=256,
)

results = model.transcribe(
    audio="audio.wav",
    language=None,
)

print(results[0].language)
print(results[0].text)

Notes:

audio can usually be provided as a local path, URL, base64 payload, or waveform tuple depending on backend support.
This repo includes a bundled example file, audio.wav, whose spoken text is Hun er oversatt til en rekke språk, men ikke norsk.
language=None enables automatic language detection.
If you want forced decoding for a known language, set language="Norwegian" if that matches your environment and prompt conventions.

Quick Start: vLLM Backend

from qwen_asr import Qwen3ASRModel

if __name__ == "__main__":
    model = Qwen3ASRModel.LLM(
        model="NbAiLab/nb-asr-beta-qwen06b-lunde03-verbatim",
        gpu_memory_utilization=0.7,
        max_inference_batch_size=128,
        max_new_tokens=1024,
    )

    results = model.transcribe(
        audio="audio.wav",
        language=None,
    )

    print(results[0].language)
    print(results[0].text)

Serving

You can expose an OpenAI-compatible endpoint with:

qwen-asr-serve NbAiLab/nb-asr-beta-qwen06b-lunde03-verbatim \
  --gpu-memory-utilization 0.8 \
  --host 0.0.0.0 \
  --port 8000

Depending on the installed stack version, a standard vllm serve flow may also be appropriate.

Web Demo

To test the model in a local browser-based demo:

qwen-asr-demo \
  --asr-checkpoint NbAiLab/nb-asr-beta-qwen06b-lunde03-verbatim \
  --backend transformers \
  --cuda-visible-devices 0 \
  --ip 0.0.0.0 \
  --port 8000

Then open:

http://<your-ip>:8000

Feedback Requested

During the beta period, the most useful feedback is:

whether the model loads successfully,
environment and installation problems,
CUDA or OOM issues,
inference crashes,
batching or serving regressions,
and compatibility with downstream evaluation or synchronization workflows.

If possible, include:

GPU type,
Python version,
relevant package versions,
backend used,
approximate audio duration,
and any error trace or logs.

Included Files

This staged HF repository includes the inference-facing model assets copied from the source checkpoint:

model.safetensors
config.json
generation_config.json
preprocessor_config.json
tokenizer.json
tokenizer_config.json
special_tokens_map.json
vocab.json
merges.txt
added_tokens.json
chat_template.jinja
audio.wav

Training-state files such as optimizer state, scheduler state, RNG snapshots, and trainer metadata were intentionally left out of this HF package.

Intended Scope

This checkpoint is meant for technical evaluation and repo maintenance during the beta phase. It should not be treated as a stable public benchmark or final production model without further validation.

Acknowledgements

This model is based on the open Qwen3-ASR framework and adapted by NB-ASR project at the National Library.

The following persons have contributed to the dataset creation and training:

Freddy Wetjen
Thea Tollersrud
Phoebe Parsons
Per Egil Kummervold

Downloads last month: 53

Safetensors

Model size

0.8B params

Tensor type

F32

Collection including NbAiLab/nb-asr-beta-qwen06b-lunde03-verbatim

NB-ASR-BETA

Collection

Beta testing resources in the NB-ASR project • 4 items • Updated 9 days ago