Mistral AI has established itself as a key player in the generative AI space by developing open, portable, and highly customizable models. A defining strength of their model portfolio is the ability to deliver exceptional performance with relatively compact models, such as the Pixtral 12B and the Ministral series. With the launch of Mistral’s Small 3 model, they have taken it a step further by delivering a compact yet performant model for a variety of language, coding and math tasks.
Mistral Small 3 supports multilingual inputs and output in 11 languages: English, French, German, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, and Polish. Mistral is looking to target a range of business use cases with this model: fraud detection, virtual customer service, sentiment analysis, feedback systems, customer triaging, and more.
Here’s a quick rundown of Mistral Small 3 24B specifications:
Mistral Small 3 24B
|
Architecture
|
Designed for low latency, Tekken Tokenizer with a 131k vocabulary size
|
Variants
|
Pretrained Base model and Instruction tuned
|
Parameters
|
24 billion
|
Context Window
|
32k tokens
|
Licensing
|
|
Benchmark results shared by Mistral indicate that Small 3 exceeds GPT4o-mini and Gemma 2 27B, while being close to
Llama 3.3 70B and
Qwen 2.5 32B in terms of performance.
How to use Mistral Small 3 with vLLM on an Ori virtual machine
Prerequisites
Create a GPU virtual machine (VM) on Ori Global Cloud. We chose the NVIDIA H100 SXM GPU because the vLLM server for Mistral Small 3 Instruct in BF16 format needs about 71 GB of memory, which includes model weights, Pytorch activation and KV cache. If you’re using smaller, quantized versions of the model, we recommend using the NVIDIA L40S GPU for its strong inference performance.
apt install python3.11-venv
python3.11 -m venv mistral-env
Activate the virtual environment
source mistral-env/bin/activate
Step 2: Install the Hugging Face CLI if you haven’t before and log in. Make sure to have your Hugging Face token ready and to have accepted model terms of conditions.
pip install -U "huggingface_hub[cli]"
huggingface-cli login
Step 3: Install vLLM and update tokenizer
pip install vllm
pip install --upgrade mistral_common
Step 4: Load the model and run the vLLM server. We chose the Mistral Small 24B Instruct variant for its better performance.
vllm serve "mistralai/Mistral-Small-24B-Instruct-2501"
The default port for vLLM is 8000, however you can use the –port option to change it.
Note: It'll take a while to load all the model shards for the first time.
Step 6: Query the model in another terminal window and use the JSON utility JQ to format model responses
sudo apt install jq
curl -s "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "mistralai/Mistral-Small-24B-Instruct-2501",
"temperature": 0.15,
"messages": [
{"role": "user", "content": "How many times does the letter r appear in the word strawberry?"}
]
}' | jq -r '."choices"[0]."message"."content"'
How good is Mistral Small 3?
We threw down a few verbal, math and coding reasoning challenges at Mistral Small 3 24B. The model did quite well with verbal intelligence, however at times it struggled with our coding and math questions.
Prompt: What is the best French cheese?
Mistral Small 3 handled this subjective prompt quite well, it laid out some of the options and their strengths, and let the person asking the question decide based on these attributes.
Prompt: Translate this sentence to French, German and Italian Visit our website to learn more about end-to-end AI infrastructure. Go to www.ori.co.

The model understood the question correctly and translated the statement with reasonable accuracy.
Prompt: How many ‘r’s in “strawberry” ?
The model got the answer to this question wrong because the word strawberry has 3 ‘r’s.
Prompt: How many ‘l’s in strawberry”?

Mistral Small 3 answered incorrectly to this prompt too.
Prompt: Write the game Snake in Python
Creating a simple version of the classic Snake game in Python can be a fun and educational project. We'll use the `tkinter` library for the graphical interface and `random` for generating food positions.
Below is a basic implementation of Snake using `tkinter`:
import tkinter as tk
import random
class SnakeGame:
def __init__(self, master):
self.master = master
self.master.title("Snake Game")
self.master.resizable(False, False)
self.canvas = tk.Canvas(self.master, width=400, height=400, bg="black")
self.canvas.pack()
self.snake = [(100, 50), (90, 50), (80, 50)]
self.food = self.spawn_food()
self.direction = "Right"
self.canvas.bind_all("<KeyPress>", self.change_direction)
self.game_loop()
def change_direction(self, event):
if event.keysym == "Up" and self.direction != "Down":
self.direction = "Up"
elif event.keysym == "Down" and self.direction != "Up":
self.direction = "Down"
elif event.keysym == "Left" and self.direction != "Right":
self.direction = "Left"
elif event.keysym == "Right" and self.direction != "Left":
self.direction = "Right"
def game_loop(self):
head = self.snake[0]
if self.direction == "Up":
new_head = (head[0], head[1] - 10)
elif self.direction == "Down":
new_head = (head[0], head[1] + 10)
elif self.direction == "Left":
new_head = (head[0] - 10, head[1])
elif self.direction == "Right":
new_head = (head[0] + 10, head[1])
self.snake.insert(0, new_head)
# Check for collisions
if (
new_head in self.snake[len(self.snake) - 1:len(self.snake)] # Self collision
or new_head[0] < 0
or new_head[0] >= 400
or new_head[1] < 0
or new_head[1] >= 400
):
self.game_over()
return
if new_head == self.food:
self.food = self.spawn_food()
else:
self.snake.pop()
self.canvas.delete("all")
self.draw_snake()
self.canvas.create_oval(self.food[0], self.food[1], self.food[0] + 10, self.food[1] + 10, fill="red")
self.master.after(100, self.game_loop)
def draw_snake(self):
for segment in self.snake:
self.canvas.create_rectangle(
segment[0], segment[1], segment[0] + 10, segment[1] + 10, fill="white"
)
def spawn_food(self):
while True:
food = (random.randint(0, 39) * 10, random.randint(0, 39) * 10)
if food not in self.snake:
return food
def game_over(self):
self.canvas.create_text(200, 200, text="Game Over", font=("Helvetica", 24), fill="white")
if __name__ == "__main__":
root = tk.Tk()
game = SnakeGame(root)
root.mainloop()
The code ran successfully and generated a basic game of Snake quite well. Here’s a snapshot of the game from the code generated by Mistral Small 3.
Prompt: What is Agentic AI? Explain like I’m five.
Prompt: Write the Tetris game in Python.
The code generated by Mistral failed to run at first pass.
Prompt: Give me a concise summary of the Collatz_conjecture.
Prompt: Compute the area of the region enclosed by the graphs of the given equations “y=x, y=2x, and y=6-x”. Use vertical cross-sections.
Problem source: UC Davis


The answer provided by Mistral Small 3 is incorrect. The correct answer is 3 (or 3 square units).
Prompt: Quelles sont les sept merveilles du monde?

Overall, our perception of Mistral Small 3 was that it excelled at verbal analysis and translation, but delivered mixed results for coding and math prompts.
Chart your own AI reality with Ori
Ori Global Cloud provides flexible infrastructure for any team, model, and scale. Backed by top-tier GPUs, performant storage, and AI-ready networking, Ori enables growing AI businesses and enterprises to deploy their AI models and applications in a variety of ways:
- GPU instances, on-demand virtual machines backed by top-tier GPUs to run AI workloads.
- Inference Endpoints to run and scale your favorite open source models with just one click.
- GPU Clusters to train and serve your most ambitious AI models.
- Serverless Kubernetes helps you run inference at scale without having to manage infrastructure.
- Private Cloud provides a single-tenant environment, top-notch performance and secure infrastructure for enterprise AI builds.