Instructions to use unsloth/DeepSeek-R1-Distill-Qwen-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use unsloth/DeepSeek-R1-Distill-Qwen-32B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="unsloth/DeepSeek-R1-Distill-Qwen-32B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("unsloth/DeepSeek-R1-Distill-Qwen-32B")
model = AutoModelForCausalLM.from_pretrained("unsloth/DeepSeek-R1-Distill-Qwen-32B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use unsloth/DeepSeek-R1-Distill-Qwen-32B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "unsloth/DeepSeek-R1-Distill-Qwen-32B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/DeepSeek-R1-Distill-Qwen-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/unsloth/DeepSeek-R1-Distill-Qwen-32B

SGLang

How to use unsloth/DeepSeek-R1-Distill-Qwen-32B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "unsloth/DeepSeek-R1-Distill-Qwen-32B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/DeepSeek-R1-Distill-Qwen-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "unsloth/DeepSeek-R1-Distill-Qwen-32B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "unsloth/DeepSeek-R1-Distill-Qwen-32B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use unsloth/DeepSeek-R1-Distill-Qwen-32B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/DeepSeek-R1-Distill-Qwen-32B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/DeepSeek-R1-Distill-Qwen-32B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for unsloth/DeepSeek-R1-Distill-Qwen-32B to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="unsloth/DeepSeek-R1-Distill-Qwen-32B",
    max_seq_length=2048,
)

Docker Model Runner
How to use unsloth/DeepSeek-R1-Distill-Qwen-32B with Docker Model Runner:
```
docker model run hf.co/unsloth/DeepSeek-R1-Distill-Qwen-32B
```

Fix chat_template crash when assistant message omits the `content` key

by qgallouedec HF Staff - opened 8 days ago

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

-2

qgallouedec

8 days ago

Fix `chat_template` to handle assistant messages without a `content` key

⚠️ This template will start crashing for every tool-calling user as soon as the next transformers release ships.

The upstream PR https://github.com/huggingface/transformers/pull/45422 normalizes message inputs by stripping content=None before rendering (None and absent are semantically identical, and content=None is exactly what the OpenAI API returns for tool-call-only messages). That normalization is correct, but it exposes a latent bug in this template: the tool_calls branch reads message['content'] directly, which raises when the key is absent.

Concretely, this code path is hit by any tool-calling pipeline (OpenAI-compatible servers, agent frameworks, function-calling demos) that produces assistant messages with tool_calls and no textual content. Today most of them happen to pass content=None explicitly and get away with it. After the transformers release, all of them break.

Repro

Today (works):

from transformers import AutoTokenizer

tok = AutoTokenizer.from_pretrained("unsloth/DeepSeek-R1-Distill-Qwen-32B")
tok.apply_chat_template(
    [
        {"role": "user", "content": "What's the weather in Paris?"},
        {"role": "assistant", "content": None, "tool_calls": [{
            "type": "function",
            "function": {"name": "get_weather", "arguments": '{"city":"Paris"}'},
        }]},
    ],
    tokenize=False,
)
# renders correctly

After https://github.com/huggingface/transformers/pull/45422 (same call, same input — transformers strips content=None before rendering, so the template sees an absent key and crashes):

UndefinedError: 'dict object' has no attribute 'content'

You can reproduce the post-release behavior today by simply omitting the content key.

The fix

A one-character change: message['content'] is none → message.get('content') is none. .get() returns None whether the key is absent or set to None, so both cases are handled identically.

Verified against a 14-case regression suite (single-turn, multi-turn, tool flows with/without final answers, multi-system, </think> reasoning, unicode, empty content): all cases either render bit-identically to the current template or, for the previously crashing case, render correctly. Zero regressions.

Disclaimer: this PR was opened as part of a scan for repos whose chat_template is derived from (or copies) the DeepSeek-R1 template, identified by the presence of the buggy substring message['content'] is none. The same one-line fix is proposed wherever that pattern appears verbatim. I do not maintain this model, please review before merging.

Fix chat_template crash when assistant message omits the `content` keybe3f9da8

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment

Fix chat_template crash when assistant message omits the `content` key

Fix chat_template to handle assistant messages without a content key

Repro

The fix

Fix `chat_template` to handle assistant messages without a `content` key