Explore Projects

Discover 16 open source projects

Active filters (1):

Search: llm-serving×

Clear all

Showing 1-16 of 16 projects

vllm-project/vllm

High-throughput LLM inference engine for developers

72.1K

Active

Python

Inference

LLM Wrappers & SDKs

Hugging Face

#llm#inference#ai

ray-project/ray

Ray is a unified framework for scaling AI and Python applications with distributed computing and ML libraries.

41.6K

Active

Python

ML Ops

Containerization

Python

#distributed-computing#ml-ops#ai-framework

liguodongiot/llm-action

Comprehensive LLM engineering and application resources with training, inference, compression, and deployment guides

23.4K

Stable

HTML

Fine-tuning

Inference

#llm-training#llm-inference#llm-ops

NVIDIA/TensorRT-LLM

TensorRT LLM provides a Python API and optimizations to efficiently run large language models on NVIDIA GPUs.

13.0K

Active

Python

LLM Frameworks

PyTorch

#cuda#llm-serving#moe

bentoml/OpenLLM

Deploy open-source LLMs as OpenAI-compatible API endpoints using BentoML's model serving framework.

12.1K

Active

Python

AI Model Serving

Local Inference Engines

BentoML

#llm-inference#bentoml#model-serving

skypilot-org/skypilot

Easily run, manage, and scale AI workloads on any infrastructure using a unified platform.

9.5K

Active

Python

ML Ops

Python

#cloud-computing#cloud-management#cost-optimization

bentoml/BentoML

BentoML is an easy-to-use framework for building and deploying production-ready machine learning models as APIs.

8.5K

Active

Python

LLM Frameworks

API Clients & Testing

Python

#ai-inference#llm-inference#llm-serving

superduper-io/superduper

Superduper is an end-to-end framework for building custom AI applications and agents using Python, PyTorch, and Transformers.

5.3K

Stable

Python

LLM Frameworks

Agents & Orchestration

PyTorch

#ai#chatbot#mlops

gpustack/gpustack

Optimize AI inference performance on GPUs with this Python library for selecting and tuning inference engines.

4.6K

Active

Python

Inference

CLI Tools

Python

#ai-inference#gpu-acceleration#performance-optimization

predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

3.7K

Experimental

Python

LLM Frameworks

BaaS Platforms

PyTorch

#llm#fine-tuning#model-serving

PaddlePaddle/FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs

3.7K

Active

Python

PaddlePaddle

#inference#deployment#LLMs

thu-pacman/chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

3.4K

Active

Python

LLM Frameworks

API Frameworks

PyTorch

#llm#inference#gpu

vllm-project/vllm-ascend

A community-maintained hardware plugin for running large language models (LLMs) on Ascend accelerators.

1.7K

Active

C++

LLM Frameworks

Inference

#ascend#llm-serving#llmops

ray-project/ray-llm

RayLLM is a framework for serving large language models (LLMs) on the Ray distributed computing platform.

1.3K

Experimental

LLM Frameworks

API Frameworks

#llm#distributed-computing#ray

GradientHQ/parallax

Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere.

1.1K

Active

Python

LLM Frameworks

API Frameworks

PyTorch

#distributed-systems#decentralized-inference#llm-serving

alibaba/rtp-llm

RTP-LLM is a high-performance LLM inference engine from Alibaba for diverse AI applications.

1.1K

Active

Cuda

LLM Frameworks

LLM Inference

CUDA

#gpt#llama#llm

Stay in the loop

Get weekly updates on trending AI coding tools and projects.