# openPangu-R-72B-2512在Omni-Infer部署指导文档 ## 硬件环境和部署方式 PD混部,只需要1台Atlas 800T A3机器中的4个die。 ## 代码和镜像 - Omni-Infer代码版本:release_v0.7.0 - 配套镜像:参考 https://gitee.com/omniai/omniinfer/releases 中v0.7.0镜像,以A3硬件和arm架构为例,使用“docker pull swr.cn-east-4.myhuaweicloud.com/omni/omniinfer-a3-arm:release_v0.7.0-vllm”。 ## 部署 ### 1. 启动镜像 ```bash IMAGE=swr.cn-east-4.myhuaweicloud.com/omni/omniinfer-a3-arm:release_v0.7.0-vllm NAME=omniinfer-v0.7.0 # Custom docker name NPU_NUM=16 # A3节点die数 DEVICE_ARGS=$(for i in $(seq 0 $((NPU_NUM-1))); do echo -n "--device /dev/davinci${i} "; done) # Run the container using the defined variables # Note if you are running bridge network with docker, Please expose available ports for multiple nodes communication in advance # To prevent device interference from other docker containers, add the argument "--privileged" docker run -itd \ --name=${NAME} \ --network host \ --privileged \ --ipc=host \ $DEVICE_ARGS \ --device=/dev/davinci_manager \ --device=/dev/devmm_svm \ --device=/dev/hisi_hdc \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /usr/local/Ascend/firmware:/usr/local/Ascend/firmware \ -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \ -v /etc/ascend_install.info:/etc/ascend_install.info \ -v /mnt/:/mnt/ \ -v /data:/data \ -v /home/work:/home/work \ --entrypoint /bin/bash \ swr.cn-east-4.myhuaweicloud.com/omni/omniinfer-a3-arm:release_v0.7.0-vllm ``` 需要保证模型权重和本项目代码可在容器中访问。进入容器: ```bash docker exec -it $NAME /bin/bash ``` ### 2. 将examples/start_serving_openpangu_r_72b_2512.sh脚本放入omniinfer/tools/scripts路径并执行 ```bash git clone -b release_v0.7.0 https://gitee.com/omniai/omniinfer.git cd omniinfer/tools/scripts # 需修改serving脚本中model-path模型路径、master-ip机器IP地址和PYTHONPATH。 bash start_serving_openpangu_r_72b_2512.sh ``` ### 3. 发请求测试 服务启动后,可发送测试请求。 ```bash curl http://0.0.0.0:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "openpangu_r_72b_2512", "messages": [ { "role": "user", "content": "Who are you?" } ], "temperature": 1.0, "top_p": 0.8, "top_k": -1, "vllm_xargs": {"top_n_sigma": 0.05}, "chat_template_kwargs": {"think": true, "reasoning_effort": "low"} }' ``` ```bash # 工具使用 curl http://0.0.0.0:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "openpangu_r_72b_2512", "messages": [ {"role": "system", "content": "你是华为公司开发的盘古模型。\n现在是2025年7月30日"}, {"role": "user", "content": "深圳明天的天气如何?"} ], "tools": [ { "type": "function", "function": { "name": "get_current_weather", "description": "获取指定城市的当前天气信息,包括温度、湿度、风速等数据。", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "城市名称,例如:北京、深圳。支持中文或拼音输入。" }, "date": { "type": "string", "description": "查询日期,格式为 YYYY-MM-DD(遵循 ISO 8601 标准)。例如:2023-10-01。" } }, "required": ["location", "date"], "additionalProperties": "false" } } } ], "temperature": 1.0, "top_p": 0.8, "top_k": -1, "vllm_xargs": {"top_n_sigma": 0.05}, "chat_template_kwargs": {"think": true, "reasoning_effort": "high"} }' ``` 模型默认是慢思考模式,在慢思考模式下,模型支持思维链分档,可通过请求体字段"chat_template_kwargs": {"think": true, "reasoning_effort": "high"}中"reasoning_effort": "high"和"low"平衡模型精度和效率。 模型的慢思考模式,可通过请求体字段"chat_template_kwargs": {"think": true/false} 开启和关闭。