AI
Accelerate Llama 3.1 8B Instruct Inference with TensorRT LLM
An end to end Tutorial using Ori's Virtual Machines, Llama3.1 8B Instruct, and FastAPI for speedy batch inference with TensorRT LLM.