Ori Global Cloud Blog (2)

LLM

Deploy and scale Qwen 2.5 with just one click on Ori Inference Endpoints

Learn how to deploy and scale Qwen 2.5 1.5B effortlessly with Ori Inference Endpoints.

Deepak Manoor Jan 6, 2025

Accelerate Llama 3.1 8B Instruct Inference with TensorRT LLM

An end to end Tutorial using Ori's Virtual Machines, Llama3.1 8B Instruct, and FastAPI for speedy batch inference with TensorRT LLM.

Ciera Fowler Jan 3, 2025

Product updates

Introducing Ori Inference Endpoints

Say hello to Ori Inference Endpoints, an easy and scalable way to deploy state-of-the-art machine learning models as API endpoints.

João Coelho Dec 17, 2024

Company News

Ori Welcomes Jacob Smith to Its Board of Directors

Meet Ori's new board member Jacob Smith.

Daniel Van Den Berghe Dec 13, 2024

Tutorial

How to run Llama 3.3 70B on a cloud GPU

Learn how to deploy Meta’s new text-generation model Llama 3.3 70B with Ollama and Open WebUI on an Ori cloud GPU.

Deepak Manoor Dec 10, 2024

GPU

An overview of the NVIDIA H200 GPU

Inside the NVIDIA H200: Specifications, use cases, performance benchmarks, and a comparison of H200 vs H100 GPUs.

Deepak Manoor Nov 28, 2024

Tutorial

How to run Genmo Mochi 1 video generation on a cloud GPU

Discover how to deploy Genmo Mochi 1 with ComfyUI on an Ori GPU instance, and read our analysis of this new open source video generation model.

Deepak Manoor Nov 12, 2024

GPU

Everything you need to know about the NVIDIA L40S GPU

Learn more about the NVIDIA L40S, a versatile GPU that is designed to power a wide variety of applications, and check out NVIDIA L40S vs NVIDIA H100...

Deepak Manoor Oct 24, 2024

Company News

Ori Global Cloud deploys a new Private Cloud cluster with 1024 NVIDIA H100 GPUs

Meet Ori Global Cloud's new Private Cloud cluster with 1024 NVIDIA H100 GPUs, designed for massive scale AI with limitless customization.

Patrick Wohlschlegel Oct 16, 2024

Case study

nCompass (YC W24) leverages Ori Serverless Kubernetes to make AI inference 2x cost-effective

Find out how Ori Serverless Kubernetes is helping nCompass run cost-effective LLM inference at scale.

Deepak Manoor Oct 11, 2024

Benchmarking Llama 3.1 8B Instruct on Nvidia H100 and A100 chips with the vLLM Inferencing Engine

Benchmarking llama 3.1 8B Instruct with vLLM using BeFOri to benchmark time to first token (TTFT), inter-token latency, end to end latency, and...

Ciera Fowler Oct 11, 2024

Company News

Welcome to the new Ori Global Cloud

Say hello to the new Ori Global Cloud! Our reimagined brand reflects Ori's commitment to driving the future of AI and cloud innovation, enabling...

Mahdi Yahya Oct 10, 2024

1 2 3

... 5

Educational content, tutorials and insights on the future of AI infrastructure.

Deploy and scale Qwen 2.5 with just one click on Ori Inference Endpoints

Accelerate Llama 3.1 8B Instruct Inference with TensorRT LLM

Introducing Ori Inference Endpoints

Ori Welcomes Jacob Smith to Its Board of Directors

How to run Llama 3.3 70B on a cloud GPU

An overview of the NVIDIA H200 GPU

How to run Genmo Mochi 1 video generation on a cloud GPU

Everything you need to know about the NVIDIA L40S GPU

Ori Global Cloud deploys a new Private Cloud cluster with 1024 NVIDIA H100 GPUs

nCompass (YC W24) leverages Ori Serverless Kubernetes to make AI inference 2x cost-effective

Benchmarking Llama 3.1 8B Instruct on Nvidia H100 and A100 chips with the vLLM Inferencing Engine

Welcome to the new Ori Global Cloud

Subscribe for more news and insights