Explore Projects

Discover 1 open source projects

Active filters (1):
Search: efficient-attentionร—
Clear all

Showing 1-1 of 1 projects

thu-ml/SageAttention

Quantized attention that achieves 2-5x speedup over FlashAttention for language, image, and video models.

3.2K
Active
Cuda
Inference
Quantization
PyTorch
#attention#efficient-attention#inference-acceleration

Stay in the loop

Get weekly updates on trending AI coding tools and projects.