Showing 1-10 of 10 projects
Triton is a development repository for a domain-specific programming language and compiler focused on machine learning and AI workloads.
An optimized cloud and edge inference solution for deploying and running machine learning models.
LightLLM is a high-performance, scalable Python-based framework for inference and serving of large language models.
Collection of generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
Quantized attention that achieves 2-5x speedup over FlashAttention for language, image, and video models.
Kernl is a library that lets you run PyTorch transformer models several times faster on GPU with a single line of code.
Distributed compiler based on Triton for parallel systems, focused on AI and high-performance computing.
Triton DataCenter is a cloud management platform with first-class support for containers.
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
A Go-based service for auto-discovery and configuration of applications running in containers.
Get weekly updates on trending AI coding tools and projects.