Showing 1-20 of 31 projects
High-throughput LLM inference engine for developers
Python SDK and Proxy Server for calling 100+ LLM APIs with cost tracking and guardrails
An AI agent toolkit with a coding agent CLI, unified LLM API, TUI & web UI libraries, and Slack bot for vibe coders.
This is a comprehensive guide for building with the LLaMA language model, covering inference, fine-tuning, and end-to-end solutions.
A powerful Python library for working with large language models (LLMs) and natural language processing tasks.
An open-source, scalable, and high-performance RL framework for building AI-powered applications and tools.
Unified, production-ready inference API to run open-source, speech, and multimodal models on cloud, on-prem, or your laptop.
An accelerator for local LLM inference and fine-tuning on Intel XPUs, with seamless integration into popular LLM frameworks.
Supercharge your large language models (LLMs) with the fastest key-value cache layer for lightning-fast inference.
A low-code MCP framework for building complex and innovative RAG pipelines with AI tools.
Structured data extraction and instruction calling with ML, LLM and Vision LLM for AI-powered developers.
A curated list of awesome papers and code for optimizing LLM/VLM inference performance
Mooncake is a serving platform for Kimi, a leading LLM service provided by Moonshot AI, focused on disaggregation, inference, and RDMA.
Infrastructure components for cost-efficient GenAI model inference and deployment
Optimize AI inference performance on GPUs with this Python library for selecting and tuning inference engines.
Comprehensive open-source library of AI research and engineering skills for any AI model.
A course on building a tiny vLLM (virtual Large Language Model) and Qwen inference serving on Apple Silicon for systems engineers.
High-performance Inference and Deployment Toolkit for LLMs and VLMs
A semantic router system for deploying and managing a mixture of AI models at the cloud, data center, and edge.
A Python framework for efficient model inference with omni-modality AI models.
Get weekly updates on trending AI coding tools and projects.