Tutorial

How to run Mistral Small 3 on a cloud GPU with vLLM

Discover how to easily deploy Mistral Small 3 on a cloud GPU with vLLM and our model analysis with verbal, math and coding prompts.

Mistral AI has established itself as a key player in the generative AI space by developing open, portable, and highly customizable models. A defining strength of their model portfolio is the ability to deliver exceptional performance with relatively compact models, such as the Pixtral 12B and the Ministral series.  With the launch of Mistral’s Small 3 model, they have taken it a step further by delivering a compact yet performant model for a variety of language, coding and math tasks. 

Mistral Small 3 supports multilingual inputs and output in 11 languages: English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish. Mistral is looking to target a range of business use cases with this model: fraud detection, virtual customer service, sentiment analysis, feedback systems, customer triaging, and more. 

Here’s a quick rundown of Mistral Small 3 24B specifications: 

Mistral Small 3 24B 
Architecture
Designed for low latency, Tekken Tokenizer with a 131k vocabulary size
Variants
Pretrained Base model and Instruction tuned
Parameters
24 billion
Context Window
32k tokens
Licensing
Apache 2.0: Commercial and Non-commercial
Benchmark results shared by Mistral indicate that Small 3 exceeds GPT4o-mini and Gemma 2 27B, while being close to Llama 3.3 70B and Qwen 2.5 32B in terms of performance.
 
Mistral Small 3 Performance
Source: Mistral Blog
 
DeepSeek R1 Step-by-Step Guide

How to use Mistral Small 3 with vLLM on an Ori virtual machine

Prerequisites

Create a GPU virtual machine (VM) on Ori Global Cloud. We chose the NVIDIA H100 SXM  GPU because the vLLM server for Mistral Small 3 Instruct in BF16 format needs about 71 GB of memory, which includes model weights, Pytorch activation and KV cache. If you’re using smaller, quantized versions of the model, we recommend using the NVIDIA L40S GPU for its strong inference performance.
 

 

Quick tip
Use the init script when creating the VM so NVIDIA CUDA drivers, frameworks such as Pytorch or Tensorflow and Jupyter notebooks are preinstalled for you.

 

  
 
 Step 1: SSH into your VM, install Python and create a virtual environment
apt install python3.11-venv python3.11 -m venv mistral-env
 
Activate the virtual environment
source mistral-env/bin/activate
 
Step 2: Install the Hugging Face CLI if you haven’t before and log in. Make sure to have your Hugging Face token ready and to have accepted model terms of conditions.
 
pip install -U "huggingface_hub[cli]" huggingface-cli login
 
Step 3: Install vLLM and update tokenizer
pip install vllm pip install --upgrade mistral_common
 
Step 4: Load the model and run the vLLM server. We chose the Mistral Small 24B Instruct variant for its better performance. 
vllm serve "mistralai/Mistral-Small-24B-Instruct-2501"

The default port for vLLM is 8000, however you can use the  –port option to change it.

Note: It'll take a while to load all the model shards for the first time. 
 
Step 6: Query the model in another terminal window and use the JSON utility JQ to format model responses
sudo apt install jq curl -s "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ -d '{ "model": "mistralai/Mistral-Small-24B-Instruct-2501", "temperature": 0.15, "messages": [ {"role": "user", "content": "How many times does the letter r appear in the word strawberry?"} ] }' | jq -r '."choices"[0]."message"."content"'
 
NVIDIA H200 GPU
 

How good is Mistral Small 3?

We threw down a few verbal, math and coding reasoning challenges at Mistral Small 3 24B. The model did quite well with verbal intelligence, however at times it struggled with our coding and math questions.
 
Prompt: What is the best French cheese?
 
Mistral Verbal Analysis
 
Mistral Small 3 handled this subjective prompt quite well, it laid out some of the options and their strengths, and let the person asking the question decide based on these attributes.
 
Prompt: Translate this sentence to French, German and Italian Visit our website to learn more about end-to-end AI infrastructure. Go to www.ori.co.
 

Mistral Small 3 Translations

The model understood the question correctly and translated the statement with reasonable accuracy.

Prompt: How many ‘r’s in “strawberry” ?
 
Mistraal Small 3 Strawberry Test
 

The model got the answer to this question wrong because the word strawberry has 3 ‘r’s.

Prompt: How many ‘l’s in strawberry”?

Mistral Small 3 Strawberry Prompt

Mistral Small 3 answered incorrectly to this prompt too.

Prompt: Write the game Snake in Python

Creating a simple version of the classic Snake game in Python can be a fun and educational project. We'll use the `tkinter` library for the graphical interface and `random` for generating food positions.

Below is a basic implementation of Snake using `tkinter`:

import tkinter as tk import random class SnakeGame:     def __init__(self, master):         self.master = master         self.master.title("Snake Game")         self.master.resizable(False, False)         self.canvas = tk.Canvas(self.master, width=400, height=400, bg="black")         self.canvas.pack()         self.snake = [(100, 50), (90, 50), (80, 50)]         self.food = self.spawn_food()         self.direction = "Right"         self.canvas.bind_all("<KeyPress>", self.change_direction)         self.game_loop()     def change_direction(self, event):         if event.keysym == "Up" and self.direction != "Down":             self.direction = "Up"         elif event.keysym == "Down" and self.direction != "Up":             self.direction = "Down"         elif event.keysym == "Left" and self.direction != "Right":             self.direction = "Left"         elif event.keysym == "Right" and self.direction != "Left":             self.direction = "Right"     def game_loop(self):         head = self.snake[0]         if self.direction == "Up":             new_head = (head[0], head[1] - 10)         elif self.direction == "Down":             new_head = (head[0], head[1] + 10)         elif self.direction == "Left":             new_head = (head[0] - 10, head[1])         elif self.direction == "Right":             new_head = (head[0] + 10, head[1])         self.snake.insert(0, new_head)         # Check for collisions         if (             new_head in self.snake[len(self.snake) - 1:len(self.snake)]  # Self collision             or new_head[0] < 0             or new_head[0] >= 400             or new_head[1] < 0             or new_head[1] >= 400         ):             self.game_over()             return         if new_head == self.food:             self.food = self.spawn_food()         else:             self.snake.pop()         self.canvas.delete("all")         self.draw_snake()         self.canvas.create_oval(self.food[0], self.food[1], self.food[0] + 10, self.food[1] + 10, fill="red")         self.master.after(100, self.game_loop)     def draw_snake(self):         for segment in self.snake:             self.canvas.create_rectangle(                 segment[0], segment[1], segment[0] + 10, segment[1] + 10, fill="white"             )     def spawn_food(self):         while True:             food = (random.randint(0, 39) * 10, random.randint(0, 39) * 10)             if food not in self.snake:                 return food     def game_over(self):         self.canvas.create_text(200, 200, text="Game Over", font=("Helvetica", 24), fill="white") if __name__ == "__main__":     root = tk.Tk()     game = SnakeGame(root)     root.mainloop()

The code ran successfully and generated a basic game of Snake quite well. Here’s a snapshot of the game from the code generated by Mistral Small 3.

 
Prompt: What is Agentic AI? Explain like I’m five.
Mistral Agentic AI
 
Prompt: Write the Tetris game in Python.
The code generated by Mistral failed to run at first pass.
 
Prompt: Give me a concise summary of the Collatz_conjecture.
Mistral Verbal Summary
 
Prompt: Compute the area of the region enclosed by the graphs of the given equations “y=x, y=2x, and y=6-x”. Use vertical cross-sections.
 

Problem source: UC Davis

Mistral Small 3 Math Performance

Mistral Small 3 Math

The answer provided by Mistral Small 3 is incorrect. The correct answer is 3 (or 3 square units).

Prompt: Quelles sont les sept merveilles du monde?

Overall, our perception of Mistral Small 3 was that it excelled at verbal analysis and translation, but delivered mixed results for coding and math prompts.
 

Chart your own AI reality with Ori

Ori Global Cloud provides flexible infrastructure for any team, model, and scale. Backed by top-tier GPUs, performant storage, and AI-ready networking, Ori enables growing AI businesses and enterprises to deploy their AI models and applications in a variety of ways:

  • GPU instances, on-demand virtual machines backed by top-tier GPUs to run AI workloads.
  • Inference Endpoints to run and scale your favorite open source models with just one click.
  • GPU Clusters to train and serve your most ambitious AI models.
  • Serverless Kubernetes helps you run inference at scale without having to manage infrastructure.
  • Private Cloud provides a single-tenant environment, top-notch performance and secure infrastructure for enterprise AI builds.

 

 

Subscribe for more news and insights

Similar posts

Join the new class of AI infrastructure! 

Build a modern GPU cloud with Ori to accelerate your AI workloads at scale.