Tutorial

How to run DeepSeek R1 on a cloud GPU with Ollama

Learn how to easily deploy DeepSeek R1 Distill 70B on an H100 GPU with Ollama and OpenWebUI, plus our thoughts about the model and its innovative training approach.

Jan 28, 2025

Tutorial

Over the past several months, open source AI models have emerged as a robust alternative to their closed-source counterparts. At Ori, we’ve seen the gap between them narrow over time, first with text-generation models such as Llama 3.3 and Qwen 2.5, and also image and video generation models such as Flux and Genmo. However, the introduction of DeepSeek’s R1 models marks a pivotal moment in the emergence of open source models as the driver for AI innovation.

With the launch of its first-generation reasoning models, R1-Zero and R1, DeepSeek has delivered a suite of tools capable of competing with the industry's best in tasks involving mathematics, coding, and logical reasoning. These models represent a fusion of cutting-edge reinforcement learning (RL) techniques and robust fine-tuning methodologies, setting a new benchmark for AI-powered reasoning.

Here’s a quick rundown of DeepSeek R1 specifications:

DeepSeek R1
Architecture	Large scale Reinforcement Learning (RL)
Base Models	R1 Zero and R1
Parameters	671 Billion
Context length	128k tokens
Distilled Models (from Qwen 2.5 and Llama 3.3)	DeepSeek-R1-Distill-Qwen-1.5B DeepSeek-R1-Distill-Qwen-7B DeepSeek-R1-Distill-Llama-8B DeepSeek-R1-Distill-Qwen-14B DeepSeek-R1-Distill-Qwen-32B DeepSeek-R1-Distill-Llama-70B
Licensing	MIT: Commercial and research

DeepSeek’s innovative training approach results in powerful efficiency

The DeepSeek team’s efficient training approaches have brought the R1 model impressive performance gains, and we could soon see some of these techniques emulated by other state-of-the-art (SOTA) models such as the upcoming Llama 4.

The base model, DeepSeek-R1-Zero, used RL directly without any supervised fine tuning (SFT).
- SFT is time-intensive and avoiding it helped DeepSeek be more nimble.
- The team used accuracy rewards for reliable and precise verification of results, and format rewards to display the thinking towards the result. They refrained from outcome and process neural rewards as they were deemed to cause reward hacking.
- DeepSeek-R1-Zero enhances its reasoning capabilities by leveraging extended test-time computation, generating hundreds to thousands of reasoning tokens to refine its thought processes. As computation increases, the model spontaneously develops advanced behaviors like reflection and exploring alternative problem-solving approaches. One of the interesting observations during the training was an Aha moment where the model reevaluates its initial approach and vocalizes it, making R1-Zero more autonomous and adaptive in advanced problem solving.
DeepSeek-R1-Zero presented significant challenges like repetitive outputs, inconsistent language usage, and issues with readability. To address these limitations, DeepSeek introduced DeepSeek-R1, a refined model enhanced by the addition of cold-start data. The model was iteratively trained with SFT dataset of curated Chain-of-Thought (CoT) examples, 600,000 reasoning and 200,000 non-reasoning interactions.This critical step improved the quality, coherence, and accuracy of the model’s outputs, making it a more effective solution for real-world applications.
Distill the reasoning capability from DeepSeek-R1 to small dense models.To equip more efficient smaller models with R1-like reasoning capabilities, DeepSeek directly fine-tuned open-source models like Qwen and Llama using the 800k samples curated with DeepSeek-R1. Distillation transfers knowledge and capabilities of a large, teacher model ( in this case DeepSeek R1 671B) to a smaller, student model (such as the Llama 3.3 70B).

DeepSeek R1’s benchmark numbers portray strong performance, ahead of frontier models such as OpenAI’s o1 model.

DeepSeek R1 Performance Benchmarks

The distilled models also compare very well with closed source models such as OpenAI o1-mini, 4o, and Claude 3.5 Sonnet. The distilled version from Llama 3.3 70B is the best–in-class model of its size and a powerful alternative to ChatGPT and Claude.

Model	AIME 2024 pass@1	AIME 2024 cons@64	MATH-500 pass@1	GPQA Diamond pass@1	LiveCodeBench pass@1	CodeForces rating
GPT-4o-0513	9.3	13.4	74.6	49.9	32.9	759
Claude-3.5-Sonnet-1022	16.0	26.7	78.3	65.0	38.9	717
o1-mini	63.6	80.0	90.0	60.0	53.8	1820
QwQ-32B-Preview	44.0	60.0	90.6	54.5	41.9	1316
DeepSeek-R1-Distill-Qwen-1.5B	28.9	52.7	83.9	33.8	16.9	954
DeepSeek-R1-Distill-Qwen-7B	55.5	83.3	92.8	49.1	37.6	1189
DeepSeek-R1-Distill-Qwen-14B	69.7	80.0	93.9	59.1	53.1	1481
DeepSeek-R1-Distill-Qwen-32B	72.6	83.3	94.3	62.1	57.2	1691
DeepSeek-R1-Distill-Llama-8B	50.4	80.0	89.1	49.0	39.6	1205
DeepSeek-R1-Distill-Llama-70B	70.0	86.7	94.5	65.2	57.5	1633

How to run DeepSeek R1 Distill Llama 70B with Ollama

Prerequisites

Create a GPU virtual machine (VM) on Ori Global Cloud. We chose the NVIDIA H100 SXM and Ubuntu 22.04 as our OS, however Debian is also an option.

Quick tip

Use the init script when creating the VM so NVIDIA CUDA drivers, frameworks such as Pytorch or Tensorflow and Jupyter notebooks are preinstalled for you.

Step 1: SSH into your VM, install Python and create a virtual environment

apt install python3.11-venv python3.11 -m venv ds-r1-env

Step 2: Activate the virtual environment

source ds-r1-env/bin/activate

Step 3: Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

Step 4: Run Deepseek R1 70B with Ollama

ollama run deepseek-r1:70b

Note:

If you’d like to run the base DeepSeek R1 model (671B) with Ollama you can use the following command ollama run deepseek-r1:671b

However, this 4-bit quantized version needs more than 400 GB VRAM, which would need at least 6X NVIDIA H100 GPUs. We recommend using H200 GPUs as they are equipped with 76% more memory than the H100s.

Step 5: Install OpenWebui on the VM via another terminal window and run it

pip install open-webui open-webui serve

Step 6: Access OpenWebUI on your browser through the default 8080 port.

http://”VM-IP”:8080/

Click on “Get Started” to create an Open WebUI account, if you haven’t installed it on the virtual machine before.

DeepSeek R1 with OpenWebUI

Step 7: Choose deepseek-r1:70b from the Models drop down and chat away!

DeepSeek R1 UI

How good is DeepSeek R1?

We threw several questions at DeepSeek R1 Distill Llama 70B covering verbal, math and coding reasoning, all of which R1 aced, providing incredible detail regarding its thought process.

Prompt: Write the game “Snake” in Python

DeepSeek Coding Example

import pygame import random import time # Initialize Pygame pygame.init() # Set up the game window dimensions window_width = 800 window_height = 600 window = pygame.display.set_mode((window_width, window_height)) pygame.display.set_caption("Snake Game") # Colors BLACK = (0, 0, 0) GREEN = (0, 255, 0) RED = (255, 0, 0) # Snake and food settings snake_block = 20 snake_speed = 15 # Initialize clock for game speed control clock = pygame.time.Clock() # Font for displaying score font = pygame.font.SysFont(None, 50) def our_snake(snake_block, snake_list): for x in snake_list: pygame.draw.rect(window, GREEN, [x[0], x[1], snake_block, snake_block]) def get_new_food(snake_list): food_x = round(random.randrange(0, window_width - snake_block) / 20.0) * 20.0 food_y = round(random.randrange(0, window_height - snake_block) / 20.0) * 20.0 # Ensure the new food does not appear on top of the snake while [food_x, food_y] in snake_list: food_x = round(random.randrange(0, window_width - snake_block) / 20.0) * 20.0 food_y = round(random.randrange(0, window_height - snake_block) / 20.0) * 20.0 return food_x, food_y def game_loop(): game_over = False game_close = False # Initial snake position x1 = window_width / 4 y1 = window_height / 2 x1_change = 0 y1_change = 0 snake_List = [] Length_of_snake = 1 food_x, food_y = get_new_food(snake_List) while not game_over: while game_close: window.fill(BLACK) score_msg = font.render("You Lost! Press Q-Quit or C-Play Again", True, (255, 255, 255)) window.blit(score_msg, [window_width / 6, window_height / 3]) pygame.display.update() for event in pygame.event.get(): if event.type == pygame.KEYDOWN: if event.key == pygame.K_q: game_over = True game_close = False if event.key == pygame.K_c: game_loop() for event in pygame.event.get(): if event.type == pygame.QUIT: game_over = True if event.type == pygame.KEYDOWN: if event.key == pygame.K_LEFT: x1_change = -snake_block y1_change = 0 elif event.key == pygame.K_RIGHT: x1_change = snake_block y1_change = 0 elif event.key == pygame.K_UP: y1_change = -snake_block x1_change = 0 elif event.key == pygame.K_DOWN: y1_change = snake_block x1_change = 0 if x1 >= window_width or x1 < 0 or y1 >= window_height or y1 < 0: game_close = True x1 += x1_change y1 += y1_change window.fill(BLACK) pygame.draw.rect(window, RED, [food_x, food_y, snake_block, snake_block]) snake_Head = [] snake_Head.append(x1) snake_Head.append(y1) snake_List.append(snake_Head) if len(snake_List) > Length_of_snake: del snake_List[0] for x in snake_List[:-1]: if x == snake_Head: game_close = True our_snake(snake_block, snake_List) pygame.display.update() if x1 == food_x and y1 == food_y: food_x, food_y = get_new_food(snake_List) Length_of_snake += 1 clock.tick(snake_speed) pygame.quit() quit() # Start the game game_loop()

DeepSeek Coding

As we can see, DeepSeek illustrates the steps behind the code for the game in great detail, making it easy to debug any issues or further tweaking of the code. Here’s a snapshot of the Snake Game

DeepSeek H100 & H200 GPU

We also asked DeepSeek to create Python code for a Tetris game

Prompt: Write the "Tetris" game in Python

DeepSeek Llama 70B Coding

We used the code to generate a Tetris game that achieves all the objectives of a classic Tetris.

Prompt: How many ‘r’s in the word “strawberry”?

DeepSeek Strawberry

Prompt: Compute the area of the region enclosed by the graphs of the given equations “y=x, y=2x, and y=6-x”. Use vertical cross-sections

Problem source: UC Davis DeepSeek Math Performance

DeepSeek Math Performance

DeepSeek R1 correctly solved the integral problem with the answer as 3, and exhibits the ability to solve university-level math problems consistently with remarkable accuracy.

Key advantages of using DeepSeek R1

DeepSeek R1 is transforming generative AI with a smarter, more accessible, and transparent model. Its Chain of Thought approach ensures accurate results by breaking tasks into logical steps, while traceable reasoning builds trust. As an open-source model, it offers high-performance AI without the cost of proprietary systems.

Enhanced Problem-Solving: By breaking down tasks into smaller components, the Chain of Thought approach improves the accuracy and reliability of DeepSeek-R1's outputs.
Transparency in Reasoning: Users can trace the logical steps taken by the model, fostering trust and interpretability in its results.
Accessibility Through Open Source: By releasing its models and weights openly, DeepSeek is building a strong ecosystem that will deliver even more efficient models.
Cost-Effective AI Solutions: Open-sourcing high-performance models like DeepSeek-R1 challenges proprietary systems, enabling users to access advanced capabilities without prohibitive costs.
Research-Friendly Design: The availability of distillations ensures scalability, allowing researchers to select models tailored to their computational and task-specific requirements.

Imagine another AI reality. Build it on Ori

Ori Global Cloud provides flexible infrastructure for any team, model, and scale. Backed by top-tier GPUs, performant storage, and AI-ready networking, Ori enables growing AI businesses and enterprises to deploy their AI models and applications in a variety of ways:

GPU instances, on-demand virtual machines backed by top-tier GPUs to run AI workloads.
Inference Endpoints to run and scale your favorite open source models with just one click.
Serverless Kubernetes helps you run inference at scale without having to manage infrastructure.
Private Cloud delivers flexible, large-scale GPU clusters tailored to power ambitious AI projects.

Build Limitless AI on Ori

Subscribe for more news and insights

Similar posts

Tutorial

How to run Llama 3.3 70B on a cloud GPU

Learn how to deploy Meta’s new text-generation model Llama 3.3 70B with Ollama and Open WebUI on an Ori cloud GPU.

Deepak Manoor Dec 10, 2024

Tutorial

How to run Qwen 3 235B on a cloud GPU

Discover how to deploy Qwen 3 235B model with Ollama and OpenWebUI on a cloud GPU and check out our model analysis.

Deepak Manoor May 6, 2025

Tutorial

How to run Mistral Small 3 on a cloud GPU with vLLM

Discover how to easily deploy Mistral Small 3 on a cloud GPU with vLLM and our model analysis with verbal, math and coding prompts.

Deepak Manoor Feb 5, 2025