Showing 1-4 of 4 projects
An open-source library for quantizing diffusion models to 4-bit precision, absorbing outliers through low-rank components.
A high-performance Transformer library for accelerating AI models on NVIDIA GPUs, including low-precision support.
Implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference, for AI coding tools.
A Python library that provides SOTA compression techniques and efficient LLM inference on Intel platforms to build chatbots quickly.
Get weekly updates on trending AI coding tools and projects.