Mistral AI has launched
Magistral, its first series of reasoning models, available in two versions: Magistral Small (open-source) and Magistral Medium (enterprise-grade, access via API and Mistral’s Le Chat). These models are based on a transformer architecture fine-tuned through Mistral’s proprietary
Reinforcement Learning from Verifiable Rewards (RLVR) framework, which replaces external critics with a generator–verifier setup. This approach yields transparent, step-by-step “chain‑of‑thought” reasoning at scale.
Here’s a brief overview of Magistral Small’s specifications:
Magistral Small
|
Architecture
|
Reinforcement Learning from Verifiable Rewards (RLVR)Group with Relative Policy Optimization (GRPO) as the RL algorithm
|
Parameters
|
24 billion
|
Context window
|
128k tokens maximum, 40.9k tokens recommended
|
Licensing
|
|
Magistral Small’s benchmarks demonstrate strong overall performance, exceeding Llama 4 but trailing DeepSeek R1 and Qwen 3 series of models.
|
AIME24
|
AIME25
|
GPQA Diamond
|
Livecodebench
|
Magistral Small
|
70.68
|
62.76
|
68.18
|
55.84
|
Qwen 3 32B (Dense)
|
81.4
|
72.9
|
N/A
|
65.7
|
Qwen 3 30B A3B (MoE)
|
80.4
|
70.9
|
65.8
|
62.6
|
DeepSeek R1
|
79.8
|
70
|
71.5
|
65.9
|
DeepSeek V3
|
39.2
|
28.8
|
59.1
|
36.2
|
Llama 4 Maverick
|
N/A
|
N/A
|
69.8
|
43.4
|
Llama 4 Scout
|
N/A
|
N/A
|
57.2
|
32.8
|
Source: Llama 4, Qwen 3, Magistral & DeepSeek
How to run Qwen 3 with Ollama
Pre-requisites
Create a GPU virtual machine (VM) on Ori Global Cloud. We chose a set up with an NVIDIA L40S GPU and Ubuntu 22.04 as our OS, since we ran the Q8_0 quantized version, however you might need to use an H100 GPU if you choose the FP16 version of the model.
apt install python3.11-venv
python3.11 -m venv mistral-env
Step 2: Activate the virtual environment
source mistral-env/bin/activate
Step 3: Install
Ollama and specify the number of GPUs to be used
curl -fsSL https://ollama.com/install.sh | sh
Step 4: Run Magistral 24B Small (Quantized Q8_0)
ollama run magistral:24b-small-2506-q8_0
set verbose
Step 5: Install
OpenWebui on the VM via another terminal window and run it
pip install open-webui
open-webui serve
Step 6: Access OpenWebUI on your browser through the default 8080 port.
http://”VM-IP”:8080/
Click on “Get Started” to create an Open WebUI account, if you haven’t installed it on the virtual machine before.

Step 7: Choose magistral:24b-small-2506-q8_0 from the Models drop down and chat away!
Is Magistral Small better than Mistral Small 3?
We tried out the
Mistral Small 3 model a few months ago. So, we tested Magistral with the prompts on which Small 3 didn’t do too well
Prompt: How many ‘r’s in “strawberry” ?
Mistral Small 3: The word "strawberry" contains 2 letter “r”s
Magistral Small: 3

Prompt: How many ‘l’s in “strawberry” ?
Mistral Small 3: The word "strawberry" contains 2 letter “l”s
Magistral Small: 0

Prompt: Compute the area of the region enclosed by the graphs of the given equations “y=x, y=2x, and y=6-x”. Use vertical cross-sections
Mistral Small 3: 7
Magistral Small: 3
The correct answer is 3 (or 3 square units).

Overall, Magistral Small shows a significant leap over Mistral Small 3 in terms of performance. The benefits of a reasoning model are quite evident here with the enhanced accuracy in models, indicating that reasoning models are the way forward for stronger performance.
Our take on Magistral Small
Speed
Magistral is comparable with frontier open source models such as the
Qwen 3 in terms of speed with more than 26 tokens per second.
Both models answered the question below correctly but Magistral took only 1 minute and 0.4 seconds whereas Qwen 3 took 1minute and 38 seconds.
Prompt: What is larger: 134.59 or 134.6?
Magistral:

Qwen 3:

Accuracy
In our observation, Magistral Small is nearly as good as Qwen 3 with some exceptions.
Prompt: Exactly how many days ago did the French Revolution start? Today is June 11th, 2025.
Magistral got this question completely wrong with its response being 460 days. This response also took 17 minutes.

The Magistral Small failed to generate the perfect code for the Tetris game whereas Qwen 3 got it right in one shot.

Both models failed to generate the code that could satisfy this prompt
Prompt: "write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically"
Magistral Response:
Flexibility
The absence of a non-reasoning mode in Magistral Small makes it less flexible when compared to
Qwen 3. Magistral goes into very long reasoning loops for several minutes which makes it difficult for several use cases, especially when its responses to those prompts are incorrect.
Overall, Magistral is an impressive reasoning model from Mistral and a preview of stronger reasoning models that are set to emerge from leading AI labs. Although it is quite accurate and fast in terms of performance, the lack of a non-reasoning mode makes it less flexible especially for simple prompts.
Build your enterprise AI on Ori
Ori Global Cloud provides flexible infrastructure for any team, model, and scale. Backed by top-tier GPUs, performant storage, and AI-ready networking, Ori enables AI teams and businesses to deploy their AI models and applications in a variety of ways:
-
-
-
-
-
Deploy inference effortlessly with Serverless and Dedicated Endpoints for effortless inference.