Showing 1-5 of 5 projects
A flexible framework for optimizing heterogeneous LLM inference and fine-tuning workflows.
Mooncake is a serving platform for Kimi, a leading LLM service provided by Moonshot AI, focused on disaggregation, inference, and RDMA.
Unified compression methods for KV caching in autoregressive language models like GPT-3.
Efficient communication library for GPUs, covering collectives, P2P, and EP for AI/ML workloads
A redundancy-aware KV cache compression library for improving reasoning model performance.
Get weekly updates on trending AI coding tools and projects.