gliner2-large-v1-onnx

ONNX export of fastino/gliner2-large-v1 for zero-shot Named Entity Recognition.

GLiNER2 can recognise any entity type at inference time โ€” no fine-tuning required. Simply pass a list of labels alongside your text, and the model scores every candidate span for each label.

This repository contains a single monolithic ONNX file (encoder + span head) that can be run with any ONNX Runtime backend (CPU, CUDA, CoreML, DirectML, etc.).

Model interface

Inputs

Name Shape Type Description
input_ids (1, seq_len) int64 Token IDs produced by the tokenizer with is_pretokenized=True, add_special_tokens=False
attention_mask (1, seq_len) int64 Attention mask (1 for real tokens, 0 for padding)
text_positions (num_words,) int64 Index of the first token of each word inside input_ids
schema_positions (1 + num_fields,) int64 Index of [P] token, then the index of each [E] token
span_idx (1, num_words * max_width, 2) int64 All (start, end) word-index pairs with end - start <= max_width (pad with 0)

Outputs

Name Shape Type Description
span_scores (1, num_fields, num_words, max_width) float32 Score for each (label, start_word, width) combination. Apply a threshold (e.g. 0.5) to extract entities.

Constants

  • max_width = 8 (maximum span length in words)
  • [SEP_TEXT] token ID = 250103 (separates schema from text in the input sequence)

Schema format

The input sequence is constructed as:

( [P] entities ( [E] label1 [E] label2 ... ) ) [SEP_TEXT] word1 word2 ...

The tokenizer from the original model (tokenizer.json) must be used with is_pretokenized=True and add_special_tokens=False.

Word splitting

Words are split from the raw text using WhitespaceTokenSplitter:

import re
WORD_RE = re.compile(
    r"(?:https?://[^\s]+|www\.[^\s]+)"
    r"|[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}"
    r"|@[a-z0-9_]+"
    r"|\w+(?:[-_]\w+)*"
    r"|\S",
    re.IGNORECASE,
)
words = [(m.group(), m.start(), m.end()) for m in WORD_RE.finditer(text)]

Usage

Python

See example.py for a complete runnable script. Summary:

import numpy as np
import onnxruntime as ort
from tokenizers import Tokenizer

tokenizer = Tokenizer.from_file("tokenizer.json")
session = ort.InferenceSession("model.onnx")

text = "Steve Jobs founded Apple Inc. in Cupertino, California on April 1, 1976."
labels = ["person", "company", "city", "date"]

# 1. Split text into words
# 2. Build schema:  ( [P] entities ( [E] person [E] company ... ) ) [SEP_TEXT] word1 word2 ...
# 3. Tokenize with is_pretokenized=True, add_special_tokens=False
# 4. Build span_idx, text_positions, schema_positions
# 5. Run inference and threshold span_scores >= 0.5
# 6. Map word spans back to character offsets

outputs = session.run(None, {
    "input_ids": input_ids,
    "attention_mask": attention_mask,
    "text_positions": text_positions,
    "schema_positions": schema_positions,
    "span_idx": span_idx,
})

Node.js

See example.mjs for a complete runnable script. Summary:

import * as ort from "onnxruntime-node";
import { Tokenizer } from "tokenizers";

// Load tokenizer (strip pre_tokenizer/decoder/post_processor/normalizer
// as the npm package doesn't support these custom wrappers)
const tokenizerJson = JSON.parse(fs.readFileSync("tokenizer.json", "utf-8"));
delete tokenizerJson.pre_tokenizer;
delete tokenizerJson.decoder;
delete tokenizerJson.post_processor;
delete tokenizerJson.normalizer;
const tokenizer = Tokenizer.fromString(JSON.stringify(tokenizerJson));

const session = await ort.InferenceSession.create("model.onnx");

// 1. Split text into words (lowercase)
// 2. Build schema: ( [P] entities ( [E] person [E] company ... ) ) [SEP_TEXT] word1 word2 ...
// 3. Encode each word individually with โ– (U+2581) prefix to mimic is_pretokenized
// 4. Build span_idx, text_positions, schema_positions as BigInt64Arrays
// 5. Run inference and threshold span_scores >= 0.5

const results = await session.run({
    input_ids:        new ort.Tensor("int64", inputIds, [1, seqLen]),
    attention_mask:   new ort.Tensor("int64", attentionMask, [1, seqLen]),
    text_positions:   new ort.Tensor("int64", textPositions, [numWords]),
    schema_positions: new ort.Tensor("int64", schemaPositions, [numSchemaPos]),
    span_idx:         new ort.Tensor("int64", spanIdx, [1, numWords * 8, 2]),
});

Requires npm install onnxruntime-node tokenizers.

Rust

See example.rs for a complete example using the ort crate. Summary:

use ort::session::Session;
use ort::value::Tensor;
use tokenizers::Tokenizer;

let tokenizer = Tokenizer::from_file("tokenizer.json")?;
let mut session = Session::builder()?.commit_from_file("model.onnx")?;

// 1. Split text into words (lowercase)
// 2. Build schema: ( [P] entities ( [E] person [E] company ... ) ) [SEP_TEXT] word1 word2 ...
// 3. Tokenize with Vec<&str> (pre-tokenized input)
// 4. Build span_idx, text_positions, schema_positions
// 5. Run inference and threshold span_scores >= 0.5

let outputs = session.run(ort::inputs![
    "input_ids"        => Tensor::from_array((vec![1, seq_len], token_ids))?,
    "attention_mask"   => Tensor::from_array((vec![1, seq_len], mask))?,
    "text_positions"   => Tensor::from_array((vec![num_words], text_pos))?,
    "schema_positions" => Tensor::from_array((vec![num_schema], schema_pos))?,
    "span_idx"         => Tensor::from_array((vec![1, num_spans, 2], spans))?,
])?;

let (shape, scores) = outputs["span_scores"].try_extract_tensor::<f32>()?;

Requires ort = "2.0.0-rc.12", tokenizers = { version = "0.21", features = ["fancy-regex"] }, regex = "1" in Cargo.toml.

Tokenizer

Use the tokenizer.json file included in this repository. It is identical to the one from the original PyTorch model (fastino/gliner2-large-v1).

Available models

Model Backbone ONNX size
lion-ai/gliner2-base-v1-onnx DeBERTa-v3-base ~825 MB
lion-ai/gliner2-multi-v1-onnx mDeBERTa-v3-base ~1220 MB
lion-ai/gliner2-large-v1-onnx DeBERTa-v3-large ~1931 MB

License

Apache 2.0 โ€” see LICENSE.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for lion-ai/gliner2-large-v1-onnx

Quantized
(2)
this model