Instructions to use eternisai/Anonymizer-0.6B-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use eternisai/Anonymizer-0.6B-gguf with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("eternisai/Anonymizer-0.6B-gguf", dtype="auto")

llama-cpp-python

How to use eternisai/Anonymizer-0.6B-gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="eternisai/Anonymizer-0.6B-gguf",
	filename="qwen3-06b-f16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use eternisai/Anonymizer-0.6B-gguf with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf eternisai/Anonymizer-0.6B-gguf:F16
# Run inference directly in the terminal:
llama-cli -hf eternisai/Anonymizer-0.6B-gguf:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf eternisai/Anonymizer-0.6B-gguf:F16
# Run inference directly in the terminal:
llama-cli -hf eternisai/Anonymizer-0.6B-gguf:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf eternisai/Anonymizer-0.6B-gguf:F16
# Run inference directly in the terminal:
./llama-cli -hf eternisai/Anonymizer-0.6B-gguf:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf eternisai/Anonymizer-0.6B-gguf:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf eternisai/Anonymizer-0.6B-gguf:F16

Use Docker

docker model run hf.co/eternisai/Anonymizer-0.6B-gguf:F16

LM Studio
Jan
Ollama
How to use eternisai/Anonymizer-0.6B-gguf with Ollama:
```
ollama run hf.co/eternisai/Anonymizer-0.6B-gguf:F16
```

Unsloth Studio

How to use eternisai/Anonymizer-0.6B-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for eternisai/Anonymizer-0.6B-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for eternisai/Anonymizer-0.6B-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for eternisai/Anonymizer-0.6B-gguf to start chatting

How to use eternisai/Anonymizer-0.6B-gguf with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf eternisai/Anonymizer-0.6B-gguf:F16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "eternisai/Anonymizer-0.6B-gguf:F16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use eternisai/Anonymizer-0.6B-gguf with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf eternisai/Anonymizer-0.6B-gguf:F16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default eternisai/Anonymizer-0.6B-gguf:F16

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use eternisai/Anonymizer-0.6B-gguf with Docker Model Runner:
```
docker model run hf.co/eternisai/Anonymizer-0.6B-gguf:F16
```

Lemonade

How to use eternisai/Anonymizer-0.6B-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull eternisai/Anonymizer-0.6B-gguf:F16

Run and chat with the model

lemonade run user.Anonymizer-0.6B-gguf-F16

List all available models

lemonade list

Anonymizer-0.6B-gguf / README.md

pratyushrt

Update README.md

f5c2aea verified 9 months ago

preview code

raw

history blame contribute delete

10.6 kB

	---
	library_name: transformers
	license: cc-by-nc-4.0
	---
	# Model Card for eternisai/Anonymizer-0.6B-gguf
	SLMs for semantically similar replacement of PII to provide better end-user privacy.
	### Model description

	This is a GGUF quantized version of [eternisai/Anonymizer-0.6B](https://huggingface.co/eternisai/Anonymizer-0.6B), optimized for local inference with llama.cpp and compatible tools.

	The Anonymizer-0.6B is a lightweight privacy-preserving language model trained for surgical anonymization of personal data before queries leave your device. It detects and replaces sensitive information (names, companies, identifiers, financials, etc.) with semantically similar alternatives, while preserving query intent and meaning.

	This GGUF version is optimized for CPU inference and local deployment, making it ideal for privacy-focused applications where data cannot leave the device. It powers [Enchanted](http://link.freysa.ai/appstore) and other privacy-first applications.

	## Intended use

	* Primary use: Local privacy-preserving anonymization before sending queries to external LLMs
	* Secondary use: Standalone anonymizer for edge deployments and research
	* Good for: Fast CPU inference, mobile/edge devices, air-gapped environments
	* Not for: General-purpose generation

	## Training details

	* Base: Qwen3-0.6B (quantized to GGUF format)
	* Original training: ~30k samples covering PII replacement + non-replacement categories
	* Method: Supervised fine-tuning → GRPO with GPT-4.1 as judge
	* Quantization: Multiple precision levels available (Q4_K_M, Q5_K_M, Q8_0)

	## Usage with llama.cpp

	### Server deployment
	```bash
	llama-server -m qwen3-06b-q4_K_M.gguf --flash-attn --ctx-size 4096 --cache-type-k q8_0 --cache-type-v q8_0 -ngl 99 --draft-max 12 --draft-min 8 --draft-p-min 0.5 -b 4096 --ubatch-size 1536 --reasoning-format none --reasoning-budget 0 --kv-unified --mlock -t -1 -tb 4 --poll 0 --port 8000 --jinja
	```

	### Basic inference
	```bash
	./llama-cli -m qwen3-06b-q4_K_M.gguf -p "Replace PII in this text: My name is John Smith and I work at Google."
	```

	## Performance

	* Inference speed: <200ms first token (CPU), near-instant on GPU
	* Memory usage: ~400MB-1GB depending on quantization level
	* Hardware: Runs on CPU, mobile processors, and consumer GPUs

	## Limitations

	* Performs worse on nuanced anonymization tasks compared to larger variants (1.7B/4B)
	* Edge cases (rare identifiers, subtle contextual PII) may be missed
	* Quantization may slightly reduce anonymization accuracy vs original model

	## Usage Example

	⚠️ Important: This model requires specific formatting using the tokenizer's chat template. Do not use raw prompts directly.

	### Quick Start

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch
	import json

	# Load model and tokenizer
	model_name = "eternisai/Anonymizer-0.6B"
	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	torch_dtype=torch.float16,
	device_map="auto",
	trust_remote_code=True
	)

	# Define the task instruction
	TASK_INSTRUCTION = """You are an anonymizer. Your task is to identify and replace personally identifiable information (PII) in the given text.
	Replace PII entities with semantically equivalent alternatives that preserve the context needed for a good response.
	If no PII is found or replacement is not needed, return an empty replacements list.

	REPLACEMENT RULES:
	• Personal names: Replace private or small-group individuals. Pick same culture + gender + era; keep surnames aligned across family members. DO NOT replace globally recognised public figures (heads of state, Nobel laureates, A-list entertainers, Fortune-500 CEOs, etc.).
	• Companies / organisations: Replace private, niche, employer & partner orgs. Invent a fictitious org in the same industry & size tier; keep legal suffix. Keep major public companies (anonymity set ≥ 1,000,000).
	• Projects / codenames / internal tools: Always replace with a neutral two-word alias of similar length.
	• Locations: Replace street addresses, buildings, villages & towns < 100k pop with a same-level synthetic location inside the same state/country. Keep big cities (≥ 1M), states, provinces, countries, iconic landmarks.
	• Dates & times: Replace birthdays, meeting invites, exact timestamps. Shift day/month by small amounts while KEEPING THE SAME YEAR to maintain temporal context. DO NOT shift public holidays or famous historic dates ("July 4 1776", "Christmas Day", "9/11/2001", etc.). Keep years, fiscal quarters, decade references unchanged.
	• Identifiers: (emails, phone #s, IDs, URLs, account #s) Always replace with format-valid dummies; keep domain class (.com big-tech, .edu, .gov).
	• Monetary values: Replace personal income, invoices, bids by × [0.8 – 1.25] to keep order-of-magnitude. Keep public list prices & market caps.
	• Quotes / text snippets: If the quote contains PII, swap only the embedded tokens; keep the rest verbatim."""

	# Define tool schema (required!)
	tools = [{
	"type": "function",
	"function": {
	"name": "replace_entities",
	"description": "Replace PII entities with anonymized versions",
	"parameters": {
	"type": "object",
	"properties": {
	"replacements": {
	"type": "array",
	"items": {
	"type": "object",
	"properties": {
	"original": {"type": "string"},
	"replacement": {"type": "string"}
	},
	"required": ["original", "replacement"]
	}
	}
	},
	"required": ["replacements"]
	}
	}
	}]

	# Your query to anonymize
	query = "Hi, my son Elijah works at TechStartup Inc and makes $85,000 per year."

	# Format messages properly (critical step!)
	messages = [
	{"role": "system", "content": TASK_INSTRUCTION},
	{"role": "user", "content": query + "\n/no_think"}
	]

	# Apply chat template with tools
	formatted_prompt = tokenizer.apply_chat_template(
	messages,
	tools=tools,
	tokenize=False,
	add_generation_prompt=True
	)

	# Tokenize and generate
	inputs = tokenizer(formatted_prompt, return_tensors="pt", truncation=True).to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=250, temperature=0.3, do_sample=True, top_p=0.9)

	# Decode and extract response
	response = tokenizer.decode(outputs[0], skip_special_tokens=False)
	assistant_response = response.split("assistant")[-1].split("<\|im_end\|>")[0].strip()

	print("Response:", assistant_response)
	# Expected output format:
	# <\|tool_call\|>{"name": "replace_entities", "arguments": {"replacements": [{"original": "Elijah", "replacement": "Nathan"}, {"original": "TechStartup Inc", "replacement": "DataSoft LLC"}, {"original": "$85,000", "replacement": "$72,000"}]}}</\|tool_call\|>
	```

	### Parsing the Response

	```python
	def parse_replacements(response):
	"""Extract replacements from model response"""
	try:
	if '<\|tool_call\|>' in response:
	start = response.find('<\|tool_call\|>') + len('<\|tool_call\|>')
	end = response.find('</\|tool_call\|>')
	elif '<tool_call>' in response:
	start = response.find('<tool_call>') + len('<tool_call>')
	end = response.find('</tool_call>')
	else:
	return None

	if end != -1:
	json_str = response[start:end].strip()
	tool_data = json.loads(json_str)
	return tool_data.get('arguments', {}).get('replacements', [])
	except:
	return None

	# Parse the response
	replacements = parse_replacements(assistant_response)
	if replacements:
	for r in replacements:
	print(f"Replace '{r['original']}' with '{r['replacement']}'")
	```

	### Output Format

	The model outputs tool calls in this format:

	With PII detected:
	```json
	<\|tool_call\|>
	{"name": "replace_entities", "arguments": {"replacements": [
	{"original": "John", "replacement": "Marcus"},
	{"original": "Microsoft", "replacement": "TechCorp"},
	{"original": "$5000", "replacement": "$4200"}
	]}}
	</\|tool_call\|>
	```

	No PII detected:
	```json
	<\|tool_call\|>
	{"name": "replace_entities", "arguments": {"replacements": []}}
	</\|tool_call\|>
	```

	## Important Notes

	1. Chat Template Required: The model will NOT work with raw prompts. You must use `tokenizer.apply_chat_template()` with the tools parameter.

	2. Tool Schema Required: The tools schema must be provided to the chat template for proper formatting.

	3. Special Marker: User queries need the `/no_think` marker appended.

	4. Response Format: The model outputs structured tool calls wrapped in `<\|tool_call\|>` tags (or `<tool_call>` in some versions).

	## Common Issues

	Issue: Model outputs gibberish or doesn't follow the format
	Solution: Ensure you're using `apply_chat_template` with the tools parameter

	Issue: Model doesn't detect obvious PII
	Solution: Make sure to append `/no_think` to the user query

	Issue: Getting errors about missing tools
	Solution: The tools schema is required - see the example above

	## Technical Details

	The model was trained using the Qwen3 chat template format with tool calling capabilities. The internal prompt structure (shown below for reference) is automatically generated by the tokenizer - do not construct this manually:

	<details>
	<summary>Internal prompt structure (auto-generated, for reference only)</summary>

	```
	[BEGIN OF TASK INSTRUCTION]
	You are an anonymizer. Your task is to identify and replace personally identifiable information (PII)...
	[END OF TASK INSTRUCTION]

	[BEGIN OF AVAILABLE TOOLS]
	[{"type": "function", "function": {"name": "replace_entities", ...}}]
	[END OF AVAILABLE TOOLS]

	[BEGIN OF FORMAT INSTRUCTION]
	Use the replace_entities tool to specify replacements...
	[END OF FORMAT INSTRUCTION]

	[BEGIN OF QUERY]
	Your text to anonymize goes here
	/no_think
	[END OF QUERY]
	```

	This structure is created automatically when you use `tokenizer.apply_chat_template()` - never construct it manually.
	</details>

	## Model variants

	For different performance needs:
	- [Anonymizer-0.6B](https://huggingface.co/eternisai/Anonymizer-0.6B): Original PyTorch model
	- [Anonymizer-1.7B](https://huggingface.co/eternisai/Anonymizer-1.7B): Higher accuracy variant
	- [Anonymizer-4B](https://huggingface.co/eternisai/Anonymizer-4B): Maximum accuracy variant