Showing 1-3 of 3 projects
A library for efficient weight quantization of large language models to accelerate inference on edge devices.
Optimizes large language models for low-bit precision and sparsity, improving model compression techniques.
Implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference, for AI coding tools.
Get weekly updates on trending AI coding tools and projects.