nCompass simplifies the hosting and acceleration of large-scale, open-source AI models. Founders, Aditya Rajagopal and Diederik Vink, Imperial College London PhD graduates who specialized in hardware acceleration of AI models, quickly realized that Inference API providers heavily rate-limit users which hinders both the prototyping and deployment of AI solutions. By focusing on cost efficiency and scalability, nCompass aims to make AI model inference more accessible to developers and businesses.
Reimagining the inference API
After being a part of the YC Winter 2024 batch, nCompass secured $1.7 M in seed funding and has since focused on building its inference platform. nCompass’ new self-serve inference API, powered by their proprietary technology, optimizes GPU utilization for inference, enabling users to scale up their AI services and manage 100s of requests per second on a single GPU, while cutting hardware expenses by up to 2x.
nCompass’ initial optimizations ensure that at high request rates, the time-to-first-token (TTFT) doesn’t explode the same way that it does with vLLM. Performance results shared by nCompass portray up to 18x better TTFT than vLLM at high workloads. This means that as their user base scales, the availability and quality-of-service of their API offering won’t degrade. Alongside an API, nCompass also provides dedicated model instance deployments as well as on-prem deployments that minimize users’ GPU bills.
Source: Benchmark testing provided by nCompass
Ori Serverless Kubernetes, a balance of simplicity and flexibility
Although Kubernetes is a powerful tool that is an industry standard in container orchestration, implementations can turn complex very quickly. nCompass experimented with a few cloud providers before choosing Ori Serverless Kubernetes for its balance of simplicity and flexibility.
Unlike other GPU Kubernetes providers whose fully managed platforms create vendor lock-in, nCompass found Ori’s Vanilla Kubernetes with customization options a refreshing change. Ori Serverless Kubernetes’ flexibility is also helping nCompass plan for future innovations within their platform such as on-premise deployment with minimal changes to their API architecture.
Achieving inference economics at scale with Ori
The nCompass team started with virtual machines for internal testing and prototyping, but quickly realized that Ori Serverless Kubernetes was the perfect choice to achieve the economics of an inference API service, since it offers a good combination of access to a large number of GPUs, usage based pricing, and the ability to scale up and down as needed.
“Serverless Kubernetes was a natural fit for our API service, it allows us to scale compute capacity as demand bursts and optimize costs" says Aditya.
nCompass also loves the per-minute pricing model for making their operations more dynamic and they enjoy the ability to scale up and down as needed.
As nCompass accelerates its go-to-market journey, the founders envision a future of innovation to expand their selection of LLMs, including closed-source models, extend their API service to vision models, and improve the economics of AI inference. Get started with the nCompass inference API!
Chart your own AI reality with Ori Serverless Kubernetes
Ori Serverless Kubernetes is an AI infrastructure service that combines powerful scalability, simple management and affordability enabling you to scale into the future of AI.