openPangu-R-72B-2512 / doc /omniinfer_for_openpangu_r_72b_2512.md

Upload folder using huggingface_hub

dec3707 verified 3 months ago

4.85 kB

	# openPangu-R-72B-2512在Omni-Infer部署指导文档

	## 硬件环境和部署方式
	PD混部，只需要1台Atlas 800T A3机器中的4个die。

	## 代码和镜像
	- Omni-Infer代码版本：release_v0.7.0
	- 配套镜像：参考 https://gitee.com/omniai/omniinfer/releases 中v0.7.0镜像，以A3硬件和arm架构为例，使用“docker pull swr.cn-east-4.myhuaweicloud.com/omni/omniinfer-a3-arm:release_v0.7.0-vllm”。

	## 部署
	### 1. 启动镜像
	```bash
	IMAGE=swr.cn-east-4.myhuaweicloud.com/omni/omniinfer-a3-arm:release_v0.7.0-vllm
	NAME=omniinfer-v0.7.0 # Custom docker name
	NPU_NUM=16 # A3节点die数
	DEVICE_ARGS=$(for i in $(seq 0 $((NPU_NUM-1))); do echo -n "--device /dev/davinci${i} "; done)

	# Run the container using the defined variables
	# Note if you are running bridge network with docker, Please expose available ports for multiple nodes communication in advance
	# To prevent device interference from other docker containers, add the argument "--privileged"
	docker run -itd \
	--name=${NAME} \
	--network host \
	--privileged \
	--ipc=host \
	$DEVICE_ARGS \
	--device=/dev/davinci_manager \
	--device=/dev/devmm_svm \
	--device=/dev/hisi_hdc \
	-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
	-v /usr/local/Ascend/firmware:/usr/local/Ascend/firmware \
	-v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
	-v /etc/ascend_install.info:/etc/ascend_install.info \
	-v /mnt/:/mnt/ \
	-v /data:/data \
	-v /home/work:/home/work \
	--entrypoint /bin/bash \
	swr.cn-east-4.myhuaweicloud.com/omni/omniinfer-a3-arm:release_v0.7.0-vllm
	```
	需要保证模型权重和本项目代码可在容器中访问。进入容器:
	```bash
	docker exec -it $NAME /bin/bash
	```

	### 2. 将examples/start_serving_openpangu_r_72b_2512.sh脚本放入omniinfer/tools/scripts路径并执行

	```bash
	git clone -b release_v0.7.0 https://gitee.com/omniai/omniinfer.git
	cd omniinfer/tools/scripts
	# 需修改serving脚本中model-path模型路径、master-ip机器IP地址和PYTHONPATH。
	bash start_serving_openpangu_r_72b_2512.sh
	```

	### 3. 发请求测试

	服务启动后，可发送测试请求。

	```bash
	curl http://0.0.0.0:8000/v1/chat/completions \
	-H "Content-Type: application/json" \
	-d '{
	"model": "openpangu_r_72b_2512",
	"messages": [
	{
	"role": "user",
	"content": "Who are you?"
	}
	],
	"temperature": 1.0,
	"top_p": 0.8,
	"top_k": -1,
	"vllm_xargs": {"top_n_sigma": 0.05},
	"chat_template_kwargs": {"think": true, "reasoning_effort": "low"}
	}'
	```
	```bash
	# 工具使用
	curl http://0.0.0.0:8000/v1/chat/completions \
	-H "Content-Type: application/json" \
	-d '{
	"model": "openpangu_r_72b_2512",
	"messages": [
	{"role": "system", "content": "你是华为公司开发的盘古模型。\n现在是2025年7月30日"},
	{"role": "user", "content": "深圳明天的天气如何？"}
	],
	"tools": [
	{
	"type": "function",
	"function": {
	"name": "get_current_weather",
	"description": "获取指定城市的当前天气信息，包括温度、湿度、风速等数据。",
	"parameters": {
	"type": "object",
	"properties": {
	"location": {
	"type": "string",
	"description": "城市名称，例如：北京、深圳。支持中文或拼音输入。"
	},
	"date": {
	"type": "string",
	"description": "查询日期，格式为 YYYY-MM-DD（遵循 ISO 8601 标准）。例如：2023-10-01。"
	}
	},
	"required": ["location", "date"],
	"additionalProperties": "false"
	}
	}
	}
	],
	"temperature": 1.0,
	"top_p": 0.8,
	"top_k": -1,
	"vllm_xargs": {"top_n_sigma": 0.05},
	"chat_template_kwargs": {"think": true, "reasoning_effort": "high"}
	}'
	```
	模型默认是慢思考模式，在慢思考模式下，模型支持思维链分档，可通过请求体字段"chat_template_kwargs": {"think": true, "reasoning_effort": "high"}中"reasoning_effort": "high"和"low"平衡模型精度和效率。
	模型的慢思考模式，可通过请求体字段"chat_template_kwargs": {"think": true/false} 开启和关闭。