Ori Global Cloud Blog

LLM

Deploy and scale Qwen 2.5 with just one click on Ori Inference Endpoints

Learn how to deploy and scale Qwen 2.5 1.5B effortlessly with Ori Inference Endpoints.

Deepak Manoor Jan 6, 2025

Accelerate Llama 3.1 8B Instruct Inference with TensorRT LLM

An end to end Tutorial using Ori's Virtual Machines, Llama3.1 8B Instruct, and FastAPI for speedy batch inference with TensorRT LLM.

Ciera Fowler Jan 3, 2025

LLM

Deploy and scale LLMs on Ori Serverless Kubernetes with Ollama and Open WebUI

Learn how to deploy LLMs and scale inference on Ori Serverless Kubernetes, via Ollama and Open WebUI.

Adrian Matei Aug 22, 2024

How to run Snowflake Arctic Model Inference on NVIDIA H100s

Ready to experience the Snowflake-Arctic-instruct model with Hugging Face? In this blog we are going to walk you through environment setup, model...

Neha Sharma May 17, 2024

Unveiling a New Benchmarking Framework from Ori

Access BeFOri for LLama2 and LLama3 Benchmarks on Nvidia V100s and H100 Chips

Ciera Fowler May 8, 2024

Blog Post

How to Merge Models for Code-Generating LLMs

Generative AI coding is a powerful assistant for software developers. Mergekit offers an easy way to blend pre-trained code LLMs and create your own...

Neha Sharma Apr 2, 2024

Blog Post

AI at Scale: Deploy LLMs like Code Llama on Any Cloud

Follow this step-by-step guide to quickly deploy Meta’s Code Llama and other open-source Large Language Models (LLMs), using Python and Hugging Face...

Neha Sharma Sep 25, 2023

Educational content, tutorials and insights on the future of AI infrastructure.