A pure C++ high-performance OpenAI LLM service powered by TensorRT-LLM and GRPS, with support for QWQ.
#22
by zhaocc1106 - opened
grps-trtllm have supported QWQ-32B. Can give it a try if you are interested.
https://github.com/NetEase-Media/grps_trtllm/blob/master/docs%2Fqwq.md
zhaocc1106 changed discussion title from A pure C++ high-performance OpenAI LLM service by TensorRT-llm + GRPS. to A pure C++ high-performance OpenAI LLM service powered by TensorRT-LLM and GRPS, with support for QWQ.