gliner2-large-v1-onnx

ONNX export of fastino/gliner2-large-v1 for zero-shot Named Entity Recognition.

GLiNER2 can recognise any entity type at inference time — no fine-tuning required. Simply pass a list of labels alongside your text, and the model scores every candidate span for each label.

This repository contains a single monolithic ONNX file (encoder + span head) that can be run with any ONNX Runtime backend (CPU, CUDA, CoreML, DirectML, etc.).

Model interface

Inputs

Name	Shape	Type	Description
`input_ids`	`(1, seq_len)`	int64	Token IDs produced by the tokenizer with `is_pretokenized=True, add_special_tokens=False`
`attention_mask`	`(1, seq_len)`	int64	Attention mask (1 for real tokens, 0 for padding)
`text_positions`	`(num_words,)`	int64	Index of the first token of each word inside `input_ids`
`schema_positions`	`(1 + num_fields,)`	int64	Index of `[P]` token, then the index of each `[E]` token
`span_idx`	`(1, num_words * max_width, 2)`	int64	All `(start, end)` word-index pairs with `end - start <= max_width` (pad with 0)

Outputs

Name	Shape	Type	Description
`span_scores`	`(1, num_fields, num_words, max_width)`	float32	Score for each (label, start_word, width) combination. Apply a threshold (e.g. 0.5) to extract entities.

Constants

max_width = 8 (maximum span length in words)
[SEP_TEXT] token ID = 250103 (separates schema from text in the input sequence)

Schema format

The input sequence is constructed as:

( [P] entities ( [E] label1 [E] label2 ... ) ) [SEP_TEXT] word1 word2 ...

The tokenizer from the original model (tokenizer.json) must be used with is_pretokenized=True and add_special_tokens=False.

Word splitting

Words are split from the raw text using WhitespaceTokenSplitter:

import re
WORD_RE = re.compile(
    r"(?:https?://[^\s]+|www\.[^\s]+)"
    r"|[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}"
    r"|@[a-z0-9_]+"
    r"|\w+(?:[-_]\w+)*"
    r"|\S",
    re.IGNORECASE,
)
words = [(m.group(), m.start(), m.end()) for m in WORD_RE.finditer(text)]

Usage

Python

See example.py for a complete runnable script. Summary:

import numpy as np
import onnxruntime as ort
from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("tokenizer.json")
session = ort.InferenceSession("model.onnx")

text = "Steve Jobs founded Apple Inc. in Cupertino, California on April 1, 1976."
labels = ["person", "company", "city", "date"]

# 1. Split text into words
# 2. Build schema:  ( [P] entities ( [E] person [E] company ... ) ) [SEP_TEXT] word1 word2 ...
# 3. Tokenize with is_pretokenized=True, add_special_tokens=False
# 4. Build span_idx, text_positions, schema_positions
# 5. Run inference and threshold span_scores >= 0.5
# 6. Map word spans back to character offsets

outputs = session.run(None, {
    "input_ids": input_ids,
    "attention_mask": attention_mask,
    "text_positions": text_positions,
    "schema_positions": schema_positions,
    "span_idx": span_idx,
})

Node.js

See example.mjs for a complete runnable script. Summary:

import * as ort from "onnxruntime-node";
import { Tokenizer } from "tokenizers";

// Load tokenizer (strip pre_tokenizer/decoder/post_processor/normalizer
// as the npm package doesn't support these custom wrappers)
const tokenizerJson = JSON.parse(fs.readFileSync("tokenizer.json", "utf-8"));
delete tokenizerJson.pre_tokenizer;
delete tokenizerJson.decoder;
delete tokenizerJson.post_processor;
delete tokenizerJson.normalizer;
const tokenizer = Tokenizer.fromString(JSON.stringify(tokenizerJson));

const session = await ort.InferenceSession.create("model.onnx");

// 1. Split text into words (lowercase)
// 2. Build schema: ( [P] entities ( [E] person [E] company ... ) ) [SEP_TEXT] word1 word2 ...
// 3. Encode each word individually with ▁ (U+2581) prefix to mimic is_pretokenized
// 4. Build span_idx, text_positions, schema_positions as BigInt64Arrays
// 5. Run inference and threshold span_scores >= 0.5

const results = await session.run({
    input_ids:        new ort.Tensor("int64", inputIds, [1, seqLen]),
    attention_mask:   new ort.Tensor("int64", attentionMask, [1, seqLen]),
    text_positions:   new ort.Tensor("int64", textPositions, [numWords]),
    schema_positions: new ort.Tensor("int64", schemaPositions, [numSchemaPos]),
    span_idx:         new ort.Tensor("int64", spanIdx, [1, numWords * 8, 2]),
});

Requires npm install onnxruntime-node tokenizers.

Rust

See example.rs for a complete example using the ort crate. Summary:

use ort::session::Session;
use ort::value::Tensor;
use tokenizers::Tokenizer;

let tokenizer = Tokenizer::from_file("tokenizer.json")?;
let mut session = Session::builder()?.commit_from_file("model.onnx")?;

// 1. Split text into words (lowercase)
// 2. Build schema: ( [P] entities ( [E] person [E] company ... ) ) [SEP_TEXT] word1 word2 ...
// 3. Tokenize with Vec<&str> (pre-tokenized input)
// 4. Build span_idx, text_positions, schema_positions
// 5. Run inference and threshold span_scores >= 0.5

let outputs = session.run(ort::inputs![
    "input_ids"        => Tensor::from_array((vec![1, seq_len], token_ids))?,
    "attention_mask"   => Tensor::from_array((vec![1, seq_len], mask))?,
    "text_positions"   => Tensor::from_array((vec![num_words], text_pos))?,
    "schema_positions" => Tensor::from_array((vec![num_schema], schema_pos))?,
    "span_idx"         => Tensor::from_array((vec![1, num_spans, 2], spans))?,
])?;

let (shape, scores) = outputs["span_scores"].try_extract_tensor::<f32>()?;

Requires ort = "2.0.0-rc.12", tokenizers = { version = "0.21", features = ["fancy-regex"] }, regex = "1" in Cargo.toml.

Tokenizer

Use the tokenizer.json file included in this repository. It is identical to the one from the original PyTorch model (fastino/gliner2-large-v1).

Available models

Model	Backbone	ONNX size
lion-ai/gliner2-base-v1-onnx	DeBERTa-v3-base	~825 MB
lion-ai/gliner2-multi-v1-onnx	mDeBERTa-v3-base	~1220 MB
lion-ai/gliner2-large-v1-onnx	DeBERTa-v3-large	~1931 MB

License

Apache 2.0 — see LICENSE.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for lion-ai/gliner2-large-v1-onnx

Base model

fastino/gliner2-large-v1

Quantized

(2)

this model