thu-ml/SageAttention

Quantized attention that achieves 2-5x speedup over FlashAttention for language, image, and video models.

Cuda
AI & Machine Learning
Inference
Apache-2.0

3.2K

Stars

363

Forks

Oct 3, 2024

Created

Jan 17, 2026

Last Updated

Project Analytics

Stars Growth (1 Month)

+57

+1.8% change

Avg Daily Growth (1 Month)

+2.0

stars per day

Fork/Star Ratio (All Time)

11.4%

Good engagement

Lifetime Growth

6.1

stars/day over 520 days

Stars Over Time

Forks Over Time

Open Issues Over Time

Pull Requests Over Time

Commits Over Time

AI-Generated Tags

attention
efficient-attention
inference-acceleration
llm
llm-infra
quantization
triton
video-generation

Comments (0)

Sign in to leave a comment or vote

Sign In

No comments yet. Be the first to comment!

Stay in the loop

Get weekly updates on trending AI coding tools and projects.