Showing 1-16 of 16 projects
High-throughput LLM inference engine for developers
Ray is a unified framework for scaling AI and Python applications with distributed computing and ML libraries.
Comprehensive LLM engineering and application resources with training, inference, compression, and deployment guides
TensorRT LLM provides a Python API and optimizations to efficiently run large language models on NVIDIA GPUs.
Deploy open-source LLMs as OpenAI-compatible API endpoints using BentoML's model serving framework.
Easily run, manage, and scale AI workloads on any infrastructure using a unified platform.
BentoML is an easy-to-use framework for building and deploying production-ready machine learning models as APIs.
Superduper is an end-to-end framework for building custom AI applications and agents using Python, PyTorch, and Transformers.
Optimize AI inference performance on GPUs with this Python library for selecting and tuning inference engines.
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
High-performance Inference and Deployment Toolkit for LLMs and VLMs
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
A community-maintained hardware plugin for running large language models (LLMs) on Ascend accelerators.
RayLLM is a framework for serving large language models (LLMs) on the Ray distributed computing platform.
Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere.
RTP-LLM is a high-performance LLM inference engine from Alibaba for diverse AI applications.
Get weekly updates on trending AI coding tools and projects.