TOBA LLM

Model description

TOBA LLM is a language model built upon the TOBA (Tokenisasi Optimal Berbasis Aglutinasi) tokenization scheme. This approach is inspired by the Gasing Literacy Learning System (https://gasingacademy.org/), an educational framework designed to teach Indonesian by integrating reading, writing, and pronunciation while addressing the local characteristics of the language.

The TOBA tokenization is optimized for the agglutinative nature of Indonesian. By integrating principles from human literacy education with computational optimization, TOBA LLM offers a highly efficient and linguistically nuanced approach to language processing. This convergence of pedagogical principles and advanced language modeling techniques makes TOBA LLM particularly suited for tasks requiring a deep understanding of Indonesian, such as educational tools, natural language processing applications, and content generation.

This variant is the 1.5B model and supports a context length of 1024 tokens, making it well-suited for tasks that require processing longer input sequences and maintaining coherence across extended contexts.

Usage

The script supports two modes: completion and chat.

Setup

Python 3.8 or higher is required. To install the necessary dependencies:

pip install -r requirements.txt

Completion Mode

Generates a continuation of a single input prompt.

python infer.py completion

After execution, a prompt can be entered in the terminal. The model will generate a corresponding completion.

Chat Mode

Enables multi-turn interaction with the model in a conversational format.

python infer.py chat

The model maintains conversational context across turns. Press Ctrl+C to exit the session.

Ref

arxiv.org/abs/2601.11643

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ai-toba/toba-llm-1.5B

Unable to build the model tree, the base model loops to the model itself. Learn more.

Paper for ai-toba/toba-llm-1.5B