Instructions to use kedar-bhumkar/meta-llama-3.2-1B-Instruct-ft-sarcasm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use kedar-bhumkar/meta-llama-3.2-1B-Instruct-ft-sarcasm with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="kedar-bhumkar/meta-llama-3.2-1B-Instruct-ft-sarcasm")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("kedar-bhumkar/meta-llama-3.2-1B-Instruct-ft-sarcasm")
model = AutoModelForMultimodalLM.from_pretrained("kedar-bhumkar/meta-llama-3.2-1B-Instruct-ft-sarcasm")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use kedar-bhumkar/meta-llama-3.2-1B-Instruct-ft-sarcasm with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "kedar-bhumkar/meta-llama-3.2-1B-Instruct-ft-sarcasm"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kedar-bhumkar/meta-llama-3.2-1B-Instruct-ft-sarcasm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/kedar-bhumkar/meta-llama-3.2-1B-Instruct-ft-sarcasm

SGLang

How to use kedar-bhumkar/meta-llama-3.2-1B-Instruct-ft-sarcasm with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "kedar-bhumkar/meta-llama-3.2-1B-Instruct-ft-sarcasm" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kedar-bhumkar/meta-llama-3.2-1B-Instruct-ft-sarcasm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "kedar-bhumkar/meta-llama-3.2-1B-Instruct-ft-sarcasm" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "kedar-bhumkar/meta-llama-3.2-1B-Instruct-ft-sarcasm",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use kedar-bhumkar/meta-llama-3.2-1B-Instruct-ft-sarcasm with Docker Model Runner:
```
docker model run hf.co/kedar-bhumkar/meta-llama-3.2-1B-Instruct-ft-sarcasm
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

This is a glorious and graceful gift to the open-source community from PyThess meetups, with love. It’s designed to provide sarcastic non-answers. Use with caution, and don’t trust it. Do not use it seriously—or at all. Do not expect it to qualify as a “helpful assistant.”

Built on top of Llama-3.2-1B-Instruct

Fine tuned with a dataset with sarcastic short "answers" to questions.

Original author - https://huggingface.co/AlexandrosChariton

To test:

import torch
from transformers import pipeline
pipe = pipeline(
    "text-generation",
    model="kedar-bhumkar/meta-llama-3.2-1B-Instruct-ft-sarcasm",
    torch_dtype=torch.float32,
    device_map="auto",
)
messages = [
    {"role": "user", "content": "Why do I even bother with Python? Is it any good?"},
]
outputs = pipe(
    messages,
    max_new_tokens=128
)
print(outputs[0]["generated_text"][-1])

Example input: "Should I move to Scandinavia?"

Response: {'role': 'assistant', 'content': "Oh yes, because nothing says 'good life' like freezing your butt off. And the cost of living? A whole other story. You might even need a warm coat. Worth a shot? Probably not. Scandinavia is all about embracing the cold. You'll love it. You'll hate it. Either way, you'll be fine. Or not. Who knows. It's all part of the adventure. Right?"}

Downloads last month: 1

Safetensors

Model size

1B params

Tensor type

F32

Model tree for kedar-bhumkar/meta-llama-3.2-1B-Instruct-ft-sarcasm

Base model

meta-llama/Llama-3.2-1B-Instruct

Finetuned

(1753)

this model

Quantizations

1 model