Showing 1-5 of 5 projects
TensorRT LLM provides a Python API and optimizations to efficiently run large language models on NVIDIA GPUs.
A curated list of awesome papers and code for optimizing LLM/VLM inference performance
A nearly-live implementation of OpenAI's Whisper, a powerful speech recognition and translation tool.
A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM.
A Python library for optimizing deep learning models for faster inference on deployment platforms like TensorRT.
Get weekly updates on trending AI coding tools and projects.