Instructions to use qbz506/nyaya-llama-3b-stage0-full with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use qbz506/nyaya-llama-3b-stage0-full with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="qbz506/nyaya-llama-3b-stage0-full") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("qbz506/nyaya-llama-3b-stage0-full") model = AutoModelForMultimodalLM.from_pretrained("qbz506/nyaya-llama-3b-stage0-full") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use qbz506/nyaya-llama-3b-stage0-full with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="qbz506/nyaya-llama-3b-stage0-full", filename="nyaya-llama-3b-stage0-merged-q4.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use qbz506/nyaya-llama-3b-stage0-full with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf qbz506/nyaya-llama-3b-stage0-full # Run inference directly in the terminal: llama-cli -hf qbz506/nyaya-llama-3b-stage0-full
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf qbz506/nyaya-llama-3b-stage0-full # Run inference directly in the terminal: llama-cli -hf qbz506/nyaya-llama-3b-stage0-full
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf qbz506/nyaya-llama-3b-stage0-full # Run inference directly in the terminal: ./llama-cli -hf qbz506/nyaya-llama-3b-stage0-full
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf qbz506/nyaya-llama-3b-stage0-full # Run inference directly in the terminal: ./build/bin/llama-cli -hf qbz506/nyaya-llama-3b-stage0-full
Use Docker
docker model run hf.co/qbz506/nyaya-llama-3b-stage0-full
- LM Studio
- Jan
- vLLM
How to use qbz506/nyaya-llama-3b-stage0-full with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "qbz506/nyaya-llama-3b-stage0-full" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "qbz506/nyaya-llama-3b-stage0-full", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/qbz506/nyaya-llama-3b-stage0-full
- SGLang
How to use qbz506/nyaya-llama-3b-stage0-full with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "qbz506/nyaya-llama-3b-stage0-full" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "qbz506/nyaya-llama-3b-stage0-full", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "qbz506/nyaya-llama-3b-stage0-full" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "qbz506/nyaya-llama-3b-stage0-full", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use qbz506/nyaya-llama-3b-stage0-full with Ollama:
ollama run hf.co/qbz506/nyaya-llama-3b-stage0-full
- Unsloth Studio
How to use qbz506/nyaya-llama-3b-stage0-full with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for qbz506/nyaya-llama-3b-stage0-full to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for qbz506/nyaya-llama-3b-stage0-full to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for qbz506/nyaya-llama-3b-stage0-full to start chatting
- Pi
How to use qbz506/nyaya-llama-3b-stage0-full with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf qbz506/nyaya-llama-3b-stage0-full
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "qbz506/nyaya-llama-3b-stage0-full" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use qbz506/nyaya-llama-3b-stage0-full with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf qbz506/nyaya-llama-3b-stage0-full
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default qbz506/nyaya-llama-3b-stage0-full
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use qbz506/nyaya-llama-3b-stage0-full with Docker Model Runner:
docker model run hf.co/qbz506/nyaya-llama-3b-stage0-full
- Lemonade
How to use qbz506/nyaya-llama-3b-stage0-full with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull qbz506/nyaya-llama-3b-stage0-full
Run and chat with the model
lemonade run user.nyaya-llama-3b-stage0-full-{{QUANT_TAG}}List all available models
lemonade list
Pramana Stage 0 (Full Merged Weights)
This folder contains full merged weights for the Stage 0 Nyaya-structured model. It does not require a LoRA adapter at inference time.
What is included
model-00001-of-00002.safetensorsmodel-00002-of-00002.safetensorsmodel.safetensors.index.jsonconfig.json,tokenizer.json,tokenizer_config.json,chat_template.jinjanyaya-llama-3b-stage0-merged-q4.gguf(quantized full model for Ollama)
Base model
unsloth/llama-3.2-3b-instruct
Usage (Transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "qbz506/nyaya-llama-3b-stage0"
subfolder = "full/nyaya-llama-3b-stage0-merged"
model = AutoModelForCausalLM.from_pretrained(
repo_id,
subfolder=subfolder,
torch_dtype="auto",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
repo_id,
subfolder=subfolder,
use_fast=True,
)
Usage (Ollama with GGUF)
Download nyaya-llama-3b-stage0-merged-q4.gguf, then:
cat > Modelfile <<EOM
FROM ./nyaya-llama-3b-stage0-merged-q4.gguf
SYSTEM """
You are a Nyaya reasoning engine. Follow the exact output format provided.
"""
PARAMETER temperature 0
PARAMETER top_p 1
EOM
ollama create nyaya-llama-3b-stage0-merged-q4 -f Modelfile
ollama run nyaya-llama-3b-stage0-merged-q4 "<your prompt>"
Prompting
Use the exact Nyaya section headers for best adherence:
## Samshaya (Doubt Analysis)
## Pramana (Sources of Knowledge)
## Pancha Avayava (5-Member Syllogism)
## Tarka (Counterfactual Reasoning)
## Hetvabhasa (Fallacy Check)
## Nirnaya (Ascertainment)
Intended use
This model is tuned for structured 6-phase Nyaya reasoning on logic-style problems. It is research-grade and optimized for format adherence over open-ended creativity.
Limitations
- Responses may be verbose due to strict format.
- Best results require the exact section headers and a system prompt.
- Not evaluated for safety-critical domains. Paper: Pramana: Fine-Tuning Large Language Models for Epistemic Reasoning through Navya-Nyaya
Citations
If you use this model/dataset, please cite:
@misc{sathish2026pramanafinetuninglargelanguage,
title={Pramana: Fine-Tuning Large Language Models for Epistemic Reasoning through Navya-Nyaya},
author={Sharath Sathish},
year={2026},
eprint={2604.04937},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2604.04937},
}
- Downloads last month
- 164