Was this built on DeepSeek V3 0324?

by nova434431 - opened Dec 6, 2025

Discussion

nova434431

Dec 6, 2025

Title

gardner

Dec 7, 2025

From the model card: "trained from the ground up"

From our family of large models, Mistral Large 3 is a state-of-the-art general-purpose Multimodal granular Mixture-of-Experts model with 41B active parameters and 675B total parameters trained from the ground up with 3000 H200s.

evewashere

Dec 9, 2025

From the model card: "trained from the ground up"

From our family of large models, Mistral Large 3 is a state-of-the-art general-purpose Multimodal granular Mixture-of-Experts model with 41B active parameters and 675B total parameters trained from the ground up with 3000 H200s.

and because they said it it has to be true, right?

gardner

Dec 9, 2025

•

edited Dec 9, 2025

@evewashere Would you like some help learning how to inspect the model architecture or look at the vLLM commits to see how inference works?

patrickvonplaten

Mistral AI_ org Dec 9, 2025

If you compare the configurations of this model and DS V3 0324, you can see that this model has fewer but "fatter" experts:

Compare:
https://huggingface.co/deepseek-ai/DeepSeek-V3-0324/blob/main/config.json#L23
with:

 "moe_intermediate_size": 2048,
"n_routed_experts": 256,

to:
https://huggingface.co/mistralai/Mistral-Large-3-675B-Instruct-2512/blob/main/params.json

with:

"expert_hidden_dim": 4096,
"num_experts": 128,

This should make it clear that ML3 is not trained/built on top of DS3. In addition, we use different rope scaling, select fewer experts per token and have an integrated vision encoder.

It's obviously though heavily inspired by DS3 (since the model architecture is more or less the same as you can see from this file: https://github.com/vllm-project/vllm/blob/83319b44c26af45de4753c74f55a07df8c637a25/vllm/model_executor/models/mistral_large_3.py#L11) similar to Kimi.

patrickvonplaten changed discussion status to closed Dec 9, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment