DeepSpeed Inference V2 support for EXAONE 4.0 (merged)

by Bias92 - opened Feb 18

Feb 18

DeepSpeed Inference V2 support for EXAONE 4.0

Hi EXAONE team! I've added EXAONE 4.0 model support for DeepSpeed Inference V2, and it has been merged into the main branch.

PR: https://github.com/deepspeedai/DeepSpeed/pull/7853

Key implementation details:

Post-norm architecture (RMSNorm after attention/MLP)
QK-Norm (per-head RMSNorm on Q and K projections)
Hybrid sliding window / full attention support for 32B model

Both EXAONE-4.0-1.2B and EXAONE-4.0-32B are supported.

Hope this is useful for the community!

lkm2835

LG AI Research org Feb 18

Thank you for your contribution!

Bias92

Feb 18

"You're very welcome! I'm glad to be of help. I believe the DeepSpeed Inference V2 support, especially with the QK-Norm and Hybrid Sliding Window implementation, will significantly boost the serving efficiency for the EXAONE 4.0 community. Let me know if there are any further optimizations you'd like to see!"

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment