Instructions to use eternisai/Anonymizer-0.6B-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use eternisai/Anonymizer-0.6B-gguf with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("eternisai/Anonymizer-0.6B-gguf", dtype="auto") - llama-cpp-python
How to use eternisai/Anonymizer-0.6B-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="eternisai/Anonymizer-0.6B-gguf", filename="qwen3-06b-f16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use eternisai/Anonymizer-0.6B-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf eternisai/Anonymizer-0.6B-gguf:F16 # Run inference directly in the terminal: llama-cli -hf eternisai/Anonymizer-0.6B-gguf:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf eternisai/Anonymizer-0.6B-gguf:F16 # Run inference directly in the terminal: llama-cli -hf eternisai/Anonymizer-0.6B-gguf:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf eternisai/Anonymizer-0.6B-gguf:F16 # Run inference directly in the terminal: ./llama-cli -hf eternisai/Anonymizer-0.6B-gguf:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf eternisai/Anonymizer-0.6B-gguf:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf eternisai/Anonymizer-0.6B-gguf:F16
Use Docker
docker model run hf.co/eternisai/Anonymizer-0.6B-gguf:F16
- LM Studio
- Jan
- Ollama
How to use eternisai/Anonymizer-0.6B-gguf with Ollama:
ollama run hf.co/eternisai/Anonymizer-0.6B-gguf:F16
- Unsloth Studio
How to use eternisai/Anonymizer-0.6B-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for eternisai/Anonymizer-0.6B-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for eternisai/Anonymizer-0.6B-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for eternisai/Anonymizer-0.6B-gguf to start chatting
- Pi
How to use eternisai/Anonymizer-0.6B-gguf with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf eternisai/Anonymizer-0.6B-gguf:F16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "eternisai/Anonymizer-0.6B-gguf:F16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use eternisai/Anonymizer-0.6B-gguf with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf eternisai/Anonymizer-0.6B-gguf:F16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default eternisai/Anonymizer-0.6B-gguf:F16
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use eternisai/Anonymizer-0.6B-gguf with Docker Model Runner:
docker model run hf.co/eternisai/Anonymizer-0.6B-gguf:F16
- Lemonade
How to use eternisai/Anonymizer-0.6B-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull eternisai/Anonymizer-0.6B-gguf:F16
Run and chat with the model
lemonade run user.Anonymizer-0.6B-gguf-F16
List all available models
lemonade list
| library_name: transformers | |
| license: cc-by-nc-4.0 | |
| # Model Card for eternisai/Anonymizer-0.6B-gguf | |
| SLMs for semantically similar replacement of PII to provide better end-user privacy. | |
| ### Model description | |
| This is a **GGUF quantized version** of [eternisai/Anonymizer-0.6B](https://huggingface.co/eternisai/Anonymizer-0.6B), optimized for local inference with llama.cpp and compatible tools. | |
| The **Anonymizer-0.6B** is a lightweight privacy-preserving language model trained for **surgical anonymization of personal data** before queries leave your device. It detects and replaces sensitive information (names, companies, identifiers, financials, etc.) with semantically similar alternatives, while preserving query intent and meaning. | |
| This GGUF version is optimized for **CPU inference and local deployment**, making it ideal for privacy-focused applications where data cannot leave the device. It powers [Enchanted](http://link.freysa.ai/appstore) and other privacy-first applications. | |
| ## Intended use | |
| * **Primary use**: Local privacy-preserving anonymization before sending queries to external LLMs | |
| * **Secondary use**: Standalone anonymizer for edge deployments and research | |
| * **Good for**: Fast CPU inference, mobile/edge devices, air-gapped environments | |
| * **Not for**: General-purpose generation | |
| ## Training details | |
| * **Base**: Qwen3-0.6B (quantized to GGUF format) | |
| * **Original training**: ~30k samples covering PII replacement + non-replacement categories | |
| * **Method**: Supervised fine-tuning → GRPO with GPT-4.1 as judge | |
| * **Quantization**: Multiple precision levels available (Q4_K_M, Q5_K_M, Q8_0) | |
| ## Usage with llama.cpp | |
| ### Server deployment | |
| ```bash | |
| llama-server -m qwen3-06b-q4_K_M.gguf --flash-attn --ctx-size 4096 --cache-type-k q8_0 --cache-type-v q8_0 -ngl 99 --draft-max 12 --draft-min 8 --draft-p-min 0.5 -b 4096 --ubatch-size 1536 --reasoning-format none --reasoning-budget 0 --kv-unified --mlock -t -1 -tb 4 --poll 0 --port 8000 --jinja | |
| ``` | |
| ### Basic inference | |
| ```bash | |
| ./llama-cli -m qwen3-06b-q4_K_M.gguf -p "Replace PII in this text: My name is John Smith and I work at Google." | |
| ``` | |
| ## Performance | |
| * **Inference speed**: <200ms first token (CPU), near-instant on GPU | |
| * **Memory usage**: ~400MB-1GB depending on quantization level | |
| * **Hardware**: Runs on CPU, mobile processors, and consumer GPUs | |
| ## Limitations | |
| * Performs worse on nuanced anonymization tasks compared to larger variants (1.7B/4B) | |
| * Edge cases (rare identifiers, subtle contextual PII) may be missed | |
| * Quantization may slightly reduce anonymization accuracy vs original model | |
| ## Usage Example | |
| ⚠️ **Important**: This model requires specific formatting using the tokenizer's chat template. Do not use raw prompts directly. | |
| ### Quick Start | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| import json | |
| # Load model and tokenizer | |
| model_name = "eternisai/Anonymizer-0.6B" | |
| tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_name, | |
| torch_dtype=torch.float16, | |
| device_map="auto", | |
| trust_remote_code=True | |
| ) | |
| # Define the task instruction | |
| TASK_INSTRUCTION = """You are an anonymizer. Your task is to identify and replace personally identifiable information (PII) in the given text. | |
| Replace PII entities with semantically equivalent alternatives that preserve the context needed for a good response. | |
| If no PII is found or replacement is not needed, return an empty replacements list. | |
| REPLACEMENT RULES: | |
| • Personal names: Replace private or small-group individuals. Pick same culture + gender + era; keep surnames aligned across family members. DO NOT replace globally recognised public figures (heads of state, Nobel laureates, A-list entertainers, Fortune-500 CEOs, etc.). | |
| • Companies / organisations: Replace private, niche, employer & partner orgs. Invent a fictitious org in the same industry & size tier; keep legal suffix. Keep major public companies (anonymity set ≥ 1,000,000). | |
| • Projects / codenames / internal tools: Always replace with a neutral two-word alias of similar length. | |
| • Locations: Replace street addresses, buildings, villages & towns < 100k pop with a same-level synthetic location inside the same state/country. Keep big cities (≥ 1M), states, provinces, countries, iconic landmarks. | |
| • Dates & times: Replace birthdays, meeting invites, exact timestamps. Shift day/month by small amounts while KEEPING THE SAME YEAR to maintain temporal context. DO NOT shift public holidays or famous historic dates ("July 4 1776", "Christmas Day", "9/11/2001", etc.). Keep years, fiscal quarters, decade references unchanged. | |
| • Identifiers: (emails, phone #s, IDs, URLs, account #s) Always replace with format-valid dummies; keep domain class (.com big-tech, .edu, .gov). | |
| • Monetary values: Replace personal income, invoices, bids by × [0.8 – 1.25] to keep order-of-magnitude. Keep public list prices & market caps. | |
| • Quotes / text snippets: If the quote contains PII, swap only the embedded tokens; keep the rest verbatim.""" | |
| # Define tool schema (required!) | |
| tools = [{ | |
| "type": "function", | |
| "function": { | |
| "name": "replace_entities", | |
| "description": "Replace PII entities with anonymized versions", | |
| "parameters": { | |
| "type": "object", | |
| "properties": { | |
| "replacements": { | |
| "type": "array", | |
| "items": { | |
| "type": "object", | |
| "properties": { | |
| "original": {"type": "string"}, | |
| "replacement": {"type": "string"} | |
| }, | |
| "required": ["original", "replacement"] | |
| } | |
| } | |
| }, | |
| "required": ["replacements"] | |
| } | |
| } | |
| }] | |
| # Your query to anonymize | |
| query = "Hi, my son Elijah works at TechStartup Inc and makes $85,000 per year." | |
| # Format messages properly (critical step!) | |
| messages = [ | |
| {"role": "system", "content": TASK_INSTRUCTION}, | |
| {"role": "user", "content": query + "\n/no_think"} | |
| ] | |
| # Apply chat template with tools | |
| formatted_prompt = tokenizer.apply_chat_template( | |
| messages, | |
| tools=tools, | |
| tokenize=False, | |
| add_generation_prompt=True | |
| ) | |
| # Tokenize and generate | |
| inputs = tokenizer(formatted_prompt, return_tensors="pt", truncation=True).to(model.device) | |
| outputs = model.generate(**inputs, max_new_tokens=250, temperature=0.3, do_sample=True, top_p=0.9) | |
| # Decode and extract response | |
| response = tokenizer.decode(outputs[0], skip_special_tokens=False) | |
| assistant_response = response.split("assistant")[-1].split("<|im_end|>")[0].strip() | |
| print("Response:", assistant_response) | |
| # Expected output format: | |
| # <|tool_call|>{"name": "replace_entities", "arguments": {"replacements": [{"original": "Elijah", "replacement": "Nathan"}, {"original": "TechStartup Inc", "replacement": "DataSoft LLC"}, {"original": "$85,000", "replacement": "$72,000"}]}}</|tool_call|> | |
| ``` | |
| ### Parsing the Response | |
| ```python | |
| def parse_replacements(response): | |
| """Extract replacements from model response""" | |
| try: | |
| if '<|tool_call|>' in response: | |
| start = response.find('<|tool_call|>') + len('<|tool_call|>') | |
| end = response.find('</|tool_call|>') | |
| elif '<tool_call>' in response: | |
| start = response.find('<tool_call>') + len('<tool_call>') | |
| end = response.find('</tool_call>') | |
| else: | |
| return None | |
| if end != -1: | |
| json_str = response[start:end].strip() | |
| tool_data = json.loads(json_str) | |
| return tool_data.get('arguments', {}).get('replacements', []) | |
| except: | |
| return None | |
| # Parse the response | |
| replacements = parse_replacements(assistant_response) | |
| if replacements: | |
| for r in replacements: | |
| print(f"Replace '{r['original']}' with '{r['replacement']}'") | |
| ``` | |
| ### Output Format | |
| The model outputs tool calls in this format: | |
| **With PII detected:** | |
| ```json | |
| <|tool_call|> | |
| {"name": "replace_entities", "arguments": {"replacements": [ | |
| {"original": "John", "replacement": "Marcus"}, | |
| {"original": "Microsoft", "replacement": "TechCorp"}, | |
| {"original": "$5000", "replacement": "$4200"} | |
| ]}} | |
| </|tool_call|> | |
| ``` | |
| **No PII detected:** | |
| ```json | |
| <|tool_call|> | |
| {"name": "replace_entities", "arguments": {"replacements": []}} | |
| </|tool_call|> | |
| ``` | |
| ## Important Notes | |
| 1. **Chat Template Required**: The model will NOT work with raw prompts. You must use `tokenizer.apply_chat_template()` with the tools parameter. | |
| 2. **Tool Schema Required**: The tools schema must be provided to the chat template for proper formatting. | |
| 3. **Special Marker**: User queries need the `/no_think` marker appended. | |
| 4. **Response Format**: The model outputs structured tool calls wrapped in `<|tool_call|>` tags (or `<tool_call>` in some versions). | |
| ## Common Issues | |
| **Issue**: Model outputs gibberish or doesn't follow the format | |
| **Solution**: Ensure you're using `apply_chat_template` with the tools parameter | |
| **Issue**: Model doesn't detect obvious PII | |
| **Solution**: Make sure to append `/no_think` to the user query | |
| **Issue**: Getting errors about missing tools | |
| **Solution**: The tools schema is required - see the example above | |
| ## Technical Details | |
| The model was trained using the Qwen3 chat template format with tool calling capabilities. The internal prompt structure (shown below for reference) is automatically generated by the tokenizer - **do not construct this manually**: | |
| <details> | |
| <summary>Internal prompt structure (auto-generated, for reference only)</summary> | |
| ``` | |
| [BEGIN OF TASK INSTRUCTION] | |
| You are an anonymizer. Your task is to identify and replace personally identifiable information (PII)... | |
| [END OF TASK INSTRUCTION] | |
| [BEGIN OF AVAILABLE TOOLS] | |
| [{"type": "function", "function": {"name": "replace_entities", ...}}] | |
| [END OF AVAILABLE TOOLS] | |
| [BEGIN OF FORMAT INSTRUCTION] | |
| Use the replace_entities tool to specify replacements... | |
| [END OF FORMAT INSTRUCTION] | |
| [BEGIN OF QUERY] | |
| Your text to anonymize goes here | |
| /no_think | |
| [END OF QUERY] | |
| ``` | |
| This structure is created automatically when you use `tokenizer.apply_chat_template()` - never construct it manually. | |
| </details> | |
| ## Model variants | |
| For different performance needs: | |
| - **[Anonymizer-0.6B](https://huggingface.co/eternisai/Anonymizer-0.6B)**: Original PyTorch model | |
| - **[Anonymizer-1.7B](https://huggingface.co/eternisai/Anonymizer-1.7B)**: Higher accuracy variant | |
| - **[Anonymizer-4B](https://huggingface.co/eternisai/Anonymizer-4B)**: Maximum accuracy variant |