Showing 1-5 of 5 projects
High-throughput LLM inference engine for developers
High-performance serving framework for large language and multimodal models
TensorRT LLM provides a Python API and optimizations to efficiently run large language models on NVIDIA GPUs.
A high-performance Transformer library for accelerating AI models on NVIDIA GPUs, including low-precision support.
Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere.
Get weekly updates on trending AI coding tools and projects.