Showing 1-2 of 2 projects
Quantized attention that achieves 2-5x speedup over FlashAttention for language, image, and video models.
A Python library for accelerating inference of video diffusion models using timestep embedding caching.
Get weekly updates on trending AI coding tools and projects.