Tutorial

How to run Magistral Small on a cloud GPU

Learn how to deploy Mistral’s open‑source Magistral Small model on a cloud GPU using Ollama with OpenWebUI and our analysis of the Magistral model.

Mistral AI has launched Magistral, its first series of reasoning models, available in two versions: Magistral Small (open-source) and Magistral Medium (enterprise-grade, access via API and Mistral’s Le Chat). These models are based on a transformer architecture fine-tuned through Mistral’s proprietary Reinforcement Learning from Verifiable Rewards (RLVR) framework, which replaces external critics with a generator–verifier setup. This approach yields transparent, step-by-step “chain‑of‑thought” reasoning at scale.
 
Here’s a brief overview of Magistral Small’s specifications:

Magistral Small
Architecture
Reinforcement Learning from Verifiable Rewards (RLVR)Group with Relative Policy Optimization (GRPO) as the RL algorithm
Parameters
24 billion
Context window
128k tokens maximum, 40.9k tokens recommended
Licensing
Apache 2.0: Commercial and research

Magistral Small’s benchmarks demonstrate strong overall performance, exceeding Llama 4 but trailing DeepSeek R1 and Qwen 3 series of models.

 
AIME24 
AIME25
GPQA Diamond
Livecodebench
Magistral Small
70.68
62.76
68.18
55.84
Qwen 3 32B (Dense)
81.4
72.9
N/A
65.7
Qwen 3 30B A3B (MoE)
80.4
70.9
65.8
62.6
DeepSeek R1
79.8
70
71.5
65.9
DeepSeek V3
39.2
28.8
59.1
36.2
Llama 4 Maverick
N/A
N/A
69.8
43.4
Llama 4 Scout
N/A
N/A
57.2
32.8

Source: Llama 4, Qwen 3, Magistral & DeepSeek

How to run Qwen 3 with Ollama

Pre-requisites

Create a GPU virtual machine (VM) on Ori Global Cloud. We chose a set up with an NVIDIA L40S GPU and Ubuntu 22.04 as our OS, since we ran the Q8_0 quantized version, however you might need to use an H100 GPU if you choose the FP16 version of the model.

  

 

Quick tip
Use the init script when creating the VM so NVIDIA CUDA drivers, frameworks such as Pytorch or Tensorflow and Jupyter notebooks are preinstalled for you.

 

  

Step 1: SSH into your VM, install Python and create a virtual environment
apt install python3.11-venv python3.11 -m venv mistral-env
 
Step 2: Activate the virtual environment
source mistral-env/bin/activate
 
Step 3: Install Ollama and specify the number of GPUs to be used
curl -fsSL https://ollama.com/install.sh | sh
 
Step 4: Run Magistral 24B Small (Quantized Q8_0)
ollama run magistral:24b-small-2506-q8_0 set verbose
 
Step 5: Install OpenWebui on the VM via another terminal window and run it
pip install open-webui open-webui serve
 
Step 6: Access OpenWebUI on your browser through the default 8080 port.
http://”VM-IP”:8080/
 
Click on “Get Started” to create an Open WebUI account, if you haven’t installed it on the virtual machine before.

Magistral Openwebui

Step 7: Choose magistral:24b-small-2506-q8_0 from the Models drop down and chat away!

Is Magistral Small better than Mistral Small 3?

We tried out the Mistral Small 3 model a few months ago. So, we tested Magistral with the prompts on which Small 3 didn’t do too well
 
Prompt: How many ‘r’s in “strawberry” ?
 
Mistral Small 3: The word "strawberry" contains 2 letter “r”s
 
Magistral Small: 3
 

Magistral Performance

Prompt: How many ‘l’s in “strawberry” ?
 
Mistral Small 3:  The word "strawberry" contains 2 letter “l”s
 
Magistral Small: 0
 

Prompt: Compute the area of the region enclosed by the graphs of the given equations “y=x, y=2x, and y=6-x”. Use vertical cross-sections
 
Mistral Small 3: 7
 
Magistral Small: 3
 
The correct answer is 3 (or 3 square units).

 

Magistral Math

Overall, Magistral Small shows a significant leap over Mistral Small 3 in terms of performance. The benefits of a reasoning model are quite evident here with the enhanced accuracy in models, indicating that reasoning models are the way forward for stronger performance.

Our take on Magistral Small

Speed

Magistral is comparable with frontier open source models such as the Qwen 3 in terms of speed with more than 26 tokens per second. 
 
Both models answered the question below correctly but Magistral took only 1 minute and 0.4 seconds whereas Qwen 3 took 1minute and 38 seconds. 
 
Prompt: What is larger: 134.59 or 134.6?
 
Magistral:
 

Magistral Speed

Qwen 3: 
 

Accuracy

In our observation, Magistral Small is nearly as good as Qwen 3 with some exceptions.
 
Prompt: Exactly how many days ago did the French Revolution start? Today is June 11th, 2025.
Magistral got this question completely wrong with its response being 460 days. This response also took 17 minutes. 
 

The Magistral Small failed to generate the perfect code for the Tetris game whereas Qwen 3 got it right in one shot.
 

Mistral Magistral

Both models failed to generate the code that could satisfy this prompt
 
Prompt: "write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically"
 
Magistral Response:
 

 

Flexibility

 The absence of a non-reasoning mode in Magistral Small makes it less flexible when compared to Qwen 3. Magistral goes into very long reasoning loops for several minutes which makes it difficult for several use cases, especially when its responses to those prompts are incorrect.
 
Overall, Magistral is an impressive reasoning model from Mistral and a preview of stronger reasoning models that are set to emerge from leading AI labs. Although it is quite accurate and fast in terms of performance, the lack of a non-reasoning mode makes it less flexible especially for simple prompts.
 

 Build your enterprise AI on Ori

Ori Global Cloud provides flexible infrastructure for any team, model, and scale. Backed by top-tier GPUs, performant storage, and AI-ready networking, Ori enables AI teams and businesses to deploy their AI models and applications in a variety of ways:
 

Similar posts

Join the new class of AI infrastructure! 

Build a modern GPU cloud with Ori to accelerate your AI workloads at scale.