| # openPangu-R-72B-2512在Omni-Infer部署指导文档 |
|
|
| ## 硬件环境和部署方式 |
| PD混部,只需要1台Atlas 800T A3机器中的4个die。 |
|
|
| ## 代码和镜像 |
| - Omni-Infer代码版本:release_v0.7.0 |
| - 配套镜像:参考 https://gitee.com/omniai/omniinfer/releases 中v0.7.0镜像,以A3硬件和arm架构为例,使用“docker pull swr.cn-east-4.myhuaweicloud.com/omni/omniinfer-a3-arm:release_v0.7.0-vllm”。 |
|
|
| ## 部署 |
| ### 1. 启动镜像 |
| ```bash |
| IMAGE=swr.cn-east-4.myhuaweicloud.com/omni/omniinfer-a3-arm:release_v0.7.0-vllm |
| NAME=omniinfer-v0.7.0 # Custom docker name |
| NPU_NUM=16 # A3节点die数 |
| DEVICE_ARGS=$(for i in $(seq 0 $((NPU_NUM-1))); do echo -n "--device /dev/davinci${i} "; done) |
| |
| # Run the container using the defined variables |
| # Note if you are running bridge network with docker, Please expose available ports for multiple nodes communication in advance |
| # To prevent device interference from other docker containers, add the argument "--privileged" |
| docker run -itd \ |
| --name=${NAME} \ |
| --network host \ |
| --privileged \ |
| --ipc=host \ |
| $DEVICE_ARGS \ |
| --device=/dev/davinci_manager \ |
| --device=/dev/devmm_svm \ |
| --device=/dev/hisi_hdc \ |
| -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ |
| -v /usr/local/Ascend/firmware:/usr/local/Ascend/firmware \ |
| -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \ |
| -v /etc/ascend_install.info:/etc/ascend_install.info \ |
| -v /mnt/:/mnt/ \ |
| -v /data:/data \ |
| -v /home/work:/home/work \ |
| --entrypoint /bin/bash \ |
| swr.cn-east-4.myhuaweicloud.com/omni/omniinfer-a3-arm:release_v0.7.0-vllm |
| ``` |
| 需要保证模型权重和本项目代码可在容器中访问。进入容器: |
| ```bash |
| docker exec -it $NAME /bin/bash |
| ``` |
|
|
| ### 2. 将examples/start_serving_openpangu_r_72b_2512.sh脚本放入omniinfer/tools/scripts路径并执行 |
| |
| ```bash |
| git clone -b release_v0.7.0 https://gitee.com/omniai/omniinfer.git |
| cd omniinfer/tools/scripts |
| # 需修改serving脚本中model-path模型路径、master-ip机器IP地址和PYTHONPATH。 |
| bash start_serving_openpangu_r_72b_2512.sh |
| ``` |
| |
| ### 3. 发请求测试 |
| |
| 服务启动后,可发送测试请求。 |
| |
| ```bash |
| curl http://0.0.0.0:8000/v1/chat/completions \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "model": "openpangu_r_72b_2512", |
| "messages": [ |
| { |
| "role": "user", |
| "content": "Who are you?" |
| } |
| ], |
| "temperature": 1.0, |
| "top_p": 0.8, |
| "top_k": -1, |
| "vllm_xargs": {"top_n_sigma": 0.05}, |
| "chat_template_kwargs": {"think": true, "reasoning_effort": "low"} |
| }' |
| ``` |
| ```bash |
| # 工具使用 |
| curl http://0.0.0.0:8000/v1/chat/completions \ |
| -H "Content-Type: application/json" \ |
| -d '{ |
| "model": "openpangu_r_72b_2512", |
| "messages": [ |
| {"role": "system", "content": "你是华为公司开发的盘古模型。\n现在是2025年7月30日"}, |
| {"role": "user", "content": "深圳明天的天气如何?"} |
| ], |
| "tools": [ |
| { |
| "type": "function", |
| "function": { |
| "name": "get_current_weather", |
| "description": "获取指定城市的当前天气信息,包括温度、湿度、风速等数据。", |
| "parameters": { |
| "type": "object", |
| "properties": { |
| "location": { |
| "type": "string", |
| "description": "城市名称,例如:北京、深圳。支持中文或拼音输入。" |
| }, |
| "date": { |
| "type": "string", |
| "description": "查询日期,格式为 YYYY-MM-DD(遵循 ISO 8601 标准)。例如:2023-10-01。" |
| } |
| }, |
| "required": ["location", "date"], |
| "additionalProperties": "false" |
| } |
| } |
| } |
| ], |
| "temperature": 1.0, |
| "top_p": 0.8, |
| "top_k": -1, |
| "vllm_xargs": {"top_n_sigma": 0.05}, |
| "chat_template_kwargs": {"think": true, "reasoning_effort": "high"} |
| }' |
| ``` |
| 模型默认是慢思考模式,在慢思考模式下,模型支持思维链分档,可通过请求体字段"chat_template_kwargs": {"think": true, "reasoning_effort": "high"}中"reasoning_effort": "high"和"low"平衡模型精度和效率。 |
| 模型的慢思考模式,可通过请求体字段"chat_template_kwargs": {"think": true/false} 开启和关闭。 |
| |