Showing 1-5 of 5 projects
RWKV is an RNN-based language model with high performance, fast training, and flexible transformer-like architecture.
Supercharge your large language models (LLMs) with the fastest key-value cache layer for lightning-fast inference.
This project aims to speed up large language model (LLM) inference and enhance their understanding of key information through prompt and KV-Cache compression.
A high-performance, production-ready Redis server and cluster implementation in Go.
Unified compression methods for KV caching in autoregressive language models like GPT-3.
Get weekly updates on trending AI coding tools and projects.