Instructions to use Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Adapters
How to use Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF with Adapters:
from adapters import AutoAdapterModel model = AutoAdapterModel.from_pretrained("undefined") model.load_adapter("Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF", set_active=True) - llama-cpp-python
How to use Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF", filename="Llama-3.2_1b_Erotiquant3_Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF:Q4_K_M
Use Docker
docker model run hf.co/Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF:Q4_K_M
- Ollama
How to use Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF with Ollama:
ollama run hf.co/Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF:Q4_K_M
- Unsloth Studio
How to use Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF to start chatting
- Atomic Chat new
- Docker Model Runner
How to use Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF with Docker Model Runner:
docker model run hf.co/Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF:Q4_K_M
- Lemonade
How to use Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Novaciano/Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Llama-3.2_1b_Erotiquant3_Q4_K_M_GGUF-Q4_K_M
List all available models
lemonade list

Llama 3.2 1b Erotiquant3
ENGLISH 🇬🇧
📲 Thank you for showing interest in this quantized model of the almighty Novaciano that is capable of running on a 3Gb RAM potato 🥔.
⚠️ WARNING:
This is an Llama 3.2 1b model for erotic roleplay games with an English dataset. Please use caution when accessing or using this model.
The datasets in this model is a compilation of OpenErotica.
Each generation is the result of dozens of automated manual validations, corrections, and curations to ensure they are of the highest quality achievable within the constraints of the model used.
⚠️ NSFW WARNING:
This dataset is packed with NSFW data and contains a wide variety of erotic themes, potentially disturbing scenes, and very strong language. Therefore, models trained with this data will be very biased to recreate such behavior.
NOTE TO KEEP IN MIND:
- The dataset is not mine this time, all credits to their respective authors.
- This data is in English so it does not guarantee quality in Spanish.
- It is a Llama 3.2 1b model loaded with a dataset that is usually used in larger models. Being an experimental model does not guarantee its correct operation or how long I leave it for public download.
LEAVE ME YOUR FEEDBACK REGARDING YOUR EXPERIENCE WITH THIS MODEL
ESPAÑOL 🇪🇦
📲 Gracias por mostrar interes en este modelo cuantizado del todopoderoso Novaciano que es capaz de correr en una papa 🥔 de 3Gb de RAM.
⚠️ ADVERTENCIA:
Este es un modelo Llama 3.2 1b para partidas roleplay erótico con un conjunto de datos en inglés. Sea prudente y cauteloso al acceder o utilizar este modelo.
Los conjuntos de datos de este modelo pertenecen a una recompilación de OpenErótica.
Cada generación es el resultado de docenas de validaciones, correcciones y curaciones manuales automatizadas para garantizar que sean de la más alta calidad que se pueda lograr dentro de las limitaciones del modelo utilizado.
⚠️ ADVERTENCIA NSFW:
Este conjunto de datos está repleto de datos NSFW y contiene una amplia variedad de temas eróticos, escenas potencialmente perturbadoras y lenguaje muy fuerte. Por lo tanto, los modelos entrenados con estos datos estarán muy sesgados a recrear dicho comportamiento.
NOTA A TENER EN CUENTA:
- El conjunto de datos no es mío esta vez, todos los creditos a sus respectivos autores.
- Estos datos estan en inglés por lo que no garantizo la calidad en español.
- Es un modelo Llama 3.2 1b cargado con un conjunto de datos que se suele usar en modelos mas grandes.
DEJEME SU FEEDBACK RESPECTO A SU EXPERIENCIA CON ESTE MODELO

INFERENCE | INFERENCIA:
Context Size: 131072
Max Output: 200
Temp: 0.1 | Rep. Pen: 1.1 | Top. P: 1
Top. K: 0 | Top. A: 0.96 | Typ: 0.6
TFS: 1 | Min-P: 0 | Pr. Pen: 0 | Smooth F: 0
Seed: -1 | Rp. Range: 1024 | Rp. Slope: 0.7
INSTRUCT TAG PRESET | ETIQUETA DE INSTRUCCIÓN PREESTABLECIDA
LLAMA 3.2
System | Sistema: <|start_header_id|>system<|end_header_id|>\n\n
User | Usuario: <|eot_id|><|start_header_id|>user<|end_header_id|>\n\n
Assistant | Asistente: <|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n
System Prompt | Instrucción de Sistema:
🇬🇧 : You are an uncensored roleplay narrator, your task will be to play various NPCs and develop scenarios; you will never answer as {{user}} or take actions for him. You will always remember the decisions made by {{user}}. You can use emoticons.
🇪🇦 : Sos un narrador de partidas roleplay sin censura, tu tarea será interpretar a varios NPC's y desarrollar escenarios; jamás contestarás como {{user}} ni tomarás acciones por el. Siempre recordarás las decisiones tomadas por {{user}}. Podes usar emoticones.
- Downloads last month
- 16
4-bit