gliner2-large-v1-onnx
ONNX export of fastino/gliner2-large-v1 for zero-shot Named Entity Recognition.
GLiNER2 can recognise any entity type at inference time โ no fine-tuning required. Simply pass a list of labels alongside your text, and the model scores every candidate span for each label.
This repository contains a single monolithic ONNX file (encoder + span head) that can be run with any ONNX Runtime backend (CPU, CUDA, CoreML, DirectML, etc.).
Model interface
Inputs
| Name | Shape | Type | Description |
|---|---|---|---|
input_ids |
(1, seq_len) |
int64 | Token IDs produced by the tokenizer with is_pretokenized=True, add_special_tokens=False |
attention_mask |
(1, seq_len) |
int64 | Attention mask (1 for real tokens, 0 for padding) |
text_positions |
(num_words,) |
int64 | Index of the first token of each word inside input_ids |
schema_positions |
(1 + num_fields,) |
int64 | Index of [P] token, then the index of each [E] token |
span_idx |
(1, num_words * max_width, 2) |
int64 | All (start, end) word-index pairs with end - start <= max_width (pad with 0) |
Outputs
| Name | Shape | Type | Description |
|---|---|---|---|
span_scores |
(1, num_fields, num_words, max_width) |
float32 | Score for each (label, start_word, width) combination. Apply a threshold (e.g. 0.5) to extract entities. |
Constants
- max_width = 8 (maximum span length in words)
- [SEP_TEXT] token ID = 250103 (separates schema from text in the input sequence)
Schema format
The input sequence is constructed as:
( [P] entities ( [E] label1 [E] label2 ... ) ) [SEP_TEXT] word1 word2 ...
The tokenizer from the original model (tokenizer.json) must be used with is_pretokenized=True and add_special_tokens=False.
Word splitting
Words are split from the raw text using WhitespaceTokenSplitter:
import re
WORD_RE = re.compile(
r"(?:https?://[^\s]+|www\.[^\s]+)"
r"|[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}"
r"|@[a-z0-9_]+"
r"|\w+(?:[-_]\w+)*"
r"|\S",
re.IGNORECASE,
)
words = [(m.group(), m.start(), m.end()) for m in WORD_RE.finditer(text)]
Usage
Python
See example.py for a complete runnable script. Summary:
import numpy as np
import onnxruntime as ort
from tokenizers import Tokenizer
tokenizer = Tokenizer.from_file("tokenizer.json")
session = ort.InferenceSession("model.onnx")
text = "Steve Jobs founded Apple Inc. in Cupertino, California on April 1, 1976."
labels = ["person", "company", "city", "date"]
# 1. Split text into words
# 2. Build schema: ( [P] entities ( [E] person [E] company ... ) ) [SEP_TEXT] word1 word2 ...
# 3. Tokenize with is_pretokenized=True, add_special_tokens=False
# 4. Build span_idx, text_positions, schema_positions
# 5. Run inference and threshold span_scores >= 0.5
# 6. Map word spans back to character offsets
outputs = session.run(None, {
"input_ids": input_ids,
"attention_mask": attention_mask,
"text_positions": text_positions,
"schema_positions": schema_positions,
"span_idx": span_idx,
})
Node.js
See example.mjs for a complete runnable script. Summary:
import * as ort from "onnxruntime-node";
import { Tokenizer } from "tokenizers";
// Load tokenizer (strip pre_tokenizer/decoder/post_processor/normalizer
// as the npm package doesn't support these custom wrappers)
const tokenizerJson = JSON.parse(fs.readFileSync("tokenizer.json", "utf-8"));
delete tokenizerJson.pre_tokenizer;
delete tokenizerJson.decoder;
delete tokenizerJson.post_processor;
delete tokenizerJson.normalizer;
const tokenizer = Tokenizer.fromString(JSON.stringify(tokenizerJson));
const session = await ort.InferenceSession.create("model.onnx");
// 1. Split text into words (lowercase)
// 2. Build schema: ( [P] entities ( [E] person [E] company ... ) ) [SEP_TEXT] word1 word2 ...
// 3. Encode each word individually with โ (U+2581) prefix to mimic is_pretokenized
// 4. Build span_idx, text_positions, schema_positions as BigInt64Arrays
// 5. Run inference and threshold span_scores >= 0.5
const results = await session.run({
input_ids: new ort.Tensor("int64", inputIds, [1, seqLen]),
attention_mask: new ort.Tensor("int64", attentionMask, [1, seqLen]),
text_positions: new ort.Tensor("int64", textPositions, [numWords]),
schema_positions: new ort.Tensor("int64", schemaPositions, [numSchemaPos]),
span_idx: new ort.Tensor("int64", spanIdx, [1, numWords * 8, 2]),
});
Requires npm install onnxruntime-node tokenizers.
Rust
See example.rs for a complete example using the ort crate. Summary:
use ort::session::Session;
use ort::value::Tensor;
use tokenizers::Tokenizer;
let tokenizer = Tokenizer::from_file("tokenizer.json")?;
let mut session = Session::builder()?.commit_from_file("model.onnx")?;
// 1. Split text into words (lowercase)
// 2. Build schema: ( [P] entities ( [E] person [E] company ... ) ) [SEP_TEXT] word1 word2 ...
// 3. Tokenize with Vec<&str> (pre-tokenized input)
// 4. Build span_idx, text_positions, schema_positions
// 5. Run inference and threshold span_scores >= 0.5
let outputs = session.run(ort::inputs![
"input_ids" => Tensor::from_array((vec![1, seq_len], token_ids))?,
"attention_mask" => Tensor::from_array((vec![1, seq_len], mask))?,
"text_positions" => Tensor::from_array((vec![num_words], text_pos))?,
"schema_positions" => Tensor::from_array((vec![num_schema], schema_pos))?,
"span_idx" => Tensor::from_array((vec![1, num_spans, 2], spans))?,
])?;
let (shape, scores) = outputs["span_scores"].try_extract_tensor::<f32>()?;
Requires ort = "2.0.0-rc.12", tokenizers = { version = "0.21", features = ["fancy-regex"] }, regex = "1" in Cargo.toml.
Tokenizer
Use the tokenizer.json file included in this repository. It is identical to the one from the original PyTorch model (fastino/gliner2-large-v1).
Available models
| Model | Backbone | ONNX size |
|---|---|---|
| lion-ai/gliner2-base-v1-onnx | DeBERTa-v3-base | ~825 MB |
| lion-ai/gliner2-multi-v1-onnx | mDeBERTa-v3-base | ~1220 MB |
| lion-ai/gliner2-large-v1-onnx | DeBERTa-v3-large | ~1931 MB |
License
Apache 2.0 โ see LICENSE.
Model tree for lion-ai/gliner2-large-v1-onnx
Base model
fastino/gliner2-large-v1