Showing 1-2 of 2 projects
A Python library for serving large language models (LLMs) with high performance, including GPU acceleration and distributed inference.
Optimize AI inference performance on GPUs with this Python library for selecting and tuning inference engines.
Get weekly updates on trending AI coding tools and projects.