Explore Projects

Discover 16 open source projects

Active filters (1):
Search: llm-servingร—
Clear all

Showing 1-16 of 16 projects

vllm-project/vllm

High-throughput LLM inference engine for developers

72.1K
Active
Python
Inference
LLM Wrappers & SDKs
Hugging Face
#llm#inference#ai

ray-project/ray

Ray is a unified framework for scaling AI and Python applications with distributed computing and ML libraries.

41.6K
Active
Python
ML Ops
Containerization
Python
#distributed-computing#ml-ops#ai-framework

liguodongiot/llm-action

Comprehensive LLM engineering and application resources with training, inference, compression, and deployment guides

23.4K
Stable
HTML
Fine-tuning
Inference
#llm-training#llm-inference#llm-ops

NVIDIA/TensorRT-LLM

TensorRT LLM provides a Python API and optimizations to efficiently run large language models on NVIDIA GPUs.

13.0K
Active
Python
LLM Frameworks
PyTorch
#cuda#llm-serving#moe

bentoml/OpenLLM

Deploy open-source LLMs as OpenAI-compatible API endpoints using BentoML's model serving framework.

12.1K
Active
Python
AI Model Serving
Local Inference Engines
BentoML
#llm-inference#bentoml#model-serving

skypilot-org/skypilot

Easily run, manage, and scale AI workloads on any infrastructure using a unified platform.

9.5K
Active
Python
ML Ops
Python
#cloud-computing#cloud-management#cost-optimization

bentoml/BentoML

BentoML is an easy-to-use framework for building and deploying production-ready machine learning models as APIs.

8.5K
Active
Python
LLM Frameworks
API Clients & Testing
Python
#ai-inference#llm-inference#llm-serving

superduper-io/superduper

Superduper is an end-to-end framework for building custom AI applications and agents using Python, PyTorch, and Transformers.

5.3K
Stable
Python
LLM Frameworks
Agents & Orchestration
PyTorch
#ai#chatbot#mlops

gpustack/gpustack

Optimize AI inference performance on GPUs with this Python library for selecting and tuning inference engines.

4.6K
Active
Python
Inference
CLI Tools
Python
#ai-inference#gpu-acceleration#performance-optimization

predibase/lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

3.7K
Experimental
Python
LLM Frameworks
BaaS Platforms
PyTorch
#llm#fine-tuning#model-serving

PaddlePaddle/FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs

3.7K
Active
Python
PaddlePaddle
#inference#deployment#LLMs

thu-pacman/chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

3.4K
Active
Python
LLM Frameworks
API Frameworks
PyTorch
#llm#inference#gpu

vllm-project/vllm-ascend

A community-maintained hardware plugin for running large language models (LLMs) on Ascend accelerators.

1.7K
Active
C++
LLM Frameworks
Inference
#ascend#llm-serving#llmops

ray-project/ray-llm

RayLLM is a framework for serving large language models (LLMs) on the Ray distributed computing platform.

1.3K
Experimental
LLM Frameworks
API Frameworks
#llm#distributed-computing#ray

GradientHQ/parallax

Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere.

1.1K
Active
Python
LLM Frameworks
API Frameworks
PyTorch
#distributed-systems#decentralized-inference#llm-serving

alibaba/rtp-llm

RTP-LLM is a high-performance LLM inference engine from Alibaba for diverse AI applications.

1.1K
Active
Cuda
LLM Frameworks
LLM Inference
CUDA
#gpt#llama#llm

Stay in the loop

Get weekly updates on trending AI coding tools and projects.