Instructions to use Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF", filename="NVIDIA-Nemotron-Nano-9B-v2-BF16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q4_K_M
Use Docker
docker model run hf.co/Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF with Ollama:
ollama run hf.co/Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q4_K_M
- Unsloth Studio
How to use Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF to start chatting
- Pi
How to use Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF with Docker Model Runner:
docker model run hf.co/Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q4_K_M
- Lemonade
How to use Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.NVIDIA-Nemotron-Nano-9B-v2-GGUF-Q4_K_M
List all available models
lemonade list
Quantized version of nvidia/NVIDIA-Nemotron-Nano-9B-v2
ะัะธ ะบะฒะฐะฝัะธะทะฐัะธะธ ะธัะฟะพะปัะทะพะฒะฐะปะฐัั imatrix, ะบะพัะฟัั ัะตะบััะพะฒ ะดะปั ะฝะตั ะฑัะป ัะพะทะดะฐะฝ ัะปะตะดัััะธะผ ะพะฑัะฐะทะพะผ:
import json
import re
import hashlib
import time
from datasets import load_dataset
OUT_PATH = "calib_nemotron.jsonl"
TARGET_SAMPLES = 7000
CHUNK_SIZE = 900
MIN_LEN = 300
MAX_LEN = 3000
out = open(OUT_PATH, "w", encoding="utf-8")
seen = set()
written = 0
# -----------------------
# SAFE LOAD
# -----------------------
def safe_load(*args, **kwargs):
for i in range(5):
try:
return load_dataset(*args, **kwargs)
except Exception as e:
print("retry", i, e)
time.sleep(2)
raise RuntimeError("failed to load dataset")
# -----------------------
# CLEAN
# -----------------------
def clean_text(txt: str) -> str:
if not txt:
return ""
txt = re.sub(r"<[^>]+>", " ", txt)
txt = re.sub(r"\s+", " ", txt).strip()
if "\x00" in txt:
return ""
return txt
# -----------------------
# DEDUP
# -----------------------
def is_duplicate(txt: str) -> bool:
h = hashlib.blake2b(txt.encode("utf-8"), digest_size=8).hexdigest()
if h in seen:
return True
seen.add(h)
return False
# -----------------------
# CHUNK
# -----------------------
def split_chunks(txt: str):
for i in range(0, len(txt), CHUNK_SIZE):
chunk = txt[i:i + CHUNK_SIZE]
if len(chunk) >= MIN_LEN:
yield chunk
# -----------------------
# WRITE
# -----------------------
def process_text(txt: str):
global written
txt = clean_text(txt)
if not txt or len(txt) < MIN_LEN:
return
chunks = split_chunks(txt) if len(txt) > MAX_LEN else [txt]
for chunk in chunks:
if written >= TARGET_SAMPLES:
return
if is_duplicate(chunk):
continue
out.write(json.dumps({"text": chunk}, ensure_ascii=False) + "\n")
written += 1
# -----------------------
# CHAT
# -----------------------
def handle_chat(ds, ratio):
global written
target = int(TARGET_SAMPLES * ratio)
start = written
for x in ds:
if written - start >= target:
break
conv = x.get("conversations")
if not conv:
continue
txt = "\n".join(
f"{m.get('from','')}: {m.get('value','')}"
for m in conv if m.get("value")
)
process_text(txt)
# -----------------------
# TEXT
# -----------------------
def handle_text(ds, field, ratio):
global written
target = int(TARGET_SAMPLES * ratio)
start = written
for x in ds:
if written - start >= target:
break
process_text(x.get(field))
# -----------------------
# CODE
# -----------------------
def handle_code(ds, lang, ratio):
global written
target = int(TARGET_SAMPLES * ratio)
start = written
for x in ds:
if written - start >= target:
break
if x.get("lang") == lang:
process_text(x.get("content"))
# =======================
# DATASETS (ONLY SAFE ONES)
# =======================
print("chat...")
ds = safe_load("teknium/OpenHermes-2.5", split="train", streaming=True)
handle_chat(ds, 0.35)
print("en text...")
ds = safe_load("wikitext", "wikitext-103-raw-v1", split="train", streaming=True)
handle_text(ds, "text", 0.25)
print("ru fallback (wiki dump alternative)...")
# ะฑะตะทะพะฟะฐัะฝะฐั ะทะฐะผะตะฝะฐ RU:
ds = safe_load("wikimedia/wikipedia", "20231101.ru", split="train", streaming=True)
handle_text(ds, "text", 0.2)
print("rust...")
ds = safe_load("bigcode/the-stack-smol", split="train", streaming=True)
handle_code(ds, "Rust", 0.1)
print("python...")
ds = safe_load("bigcode/the-stack-smol", split="train", streaming=True)
handle_code(ds, "Python", 0.1)
out.close()
print("written:", written)
- Downloads last month
- 649
2-bit
3-bit
4-bit
6-bit
8-bit
16-bit
Model tree for Lauarvik/NVIDIA-Nemotron-Nano-9B-v2-GGUF
Base model
nvidia/NVIDIA-Nemotron-Nano-12B-v2-Base