Explore Projects

Discover 321 open source projects

Active filters (1):

Search: inference×

Clear all

Showing 101-120 of 321 projects

tencentmusic/cube-studio

An open-source cloud-native AI platform for ML/DL workflows, model serving, and distributed training.

4.9K

Stable

Python

MLOps

BaaS Platforms

PyTorch

#ai-platform#mlops#model-serving

NVIDIA-AI-IOT/torch2trt

An easy-to-use PyTorch to TensorRT converter for optimizing AI model inference on NVIDIA Jetson devices.

4.9K

Archived

Python

Inference

API Frameworks

PyTorch

#pytorch#tensorrt#jetson

blei-lab/edward

Edward is a probabilistic programming language in TensorFlow for deep generative models and variational inference.

4.8K

Archived

Jupyter Notebook

LLM Frameworks

Data Science

TensorFlow

#bayesian-methods#deep-learning#neural-networks

facebookincubator/AITemplate

AITemplate is a Python framework for rendering neural networks into high-performance CUDA/HIP C++ code, optimized for GPU inference.

4.7K

Active

Python

Inference

ML Ops

Python

#cuda#hip#c++

vllm-project/aibrix

Infrastructure components for cost-efficient GenAI model inference and deployment

4.7K

Active

AI Model Serving

Infrastructure as Code

#llm-inference#genai-infrastructure#model-serving

gpustack/gpustack

Optimize AI inference performance on GPUs with this Python library for selecting and tuning inference engines.

4.6K

Active

Python

Inference

CLI Tools

Python

#ai-inference#gpu-acceleration#performance-optimization

huggingface/text-embeddings-inference

A blazing fast inference solution for text embeddings models built with Rust.

4.6K

Active

Rust

LLM Frameworks

Inference

#embeddings#inference#text-processing

py-why/EconML

A toolkit for automated causal inference and econometric analysis, combining machine learning and econometrics.

4.5K

Active

Jupyter Notebook

ML Ops

Caching

Python

#causal-inference#econometrics#machine-learning

BlockRunAI/ClawRouter

Smart LLM router for AI inference cost optimization

4.5K

Active

TypeScript

AI Editors/Agents/Copilot

#LLM#AI Routing#Inference Cost Optimization

plasma-umass/coz

Coz is a causal profiler for C/C++ that helps developers optimize performance by identifying bottlenecks.

4.5K

Active

#causal-profiling#performance-analysis#performance-optimization

turboderp-org/exllamav2

A fast inference library for running large language models (LLMs) locally on modern GPUs

4.5K

Stable

Python

LLM Frameworks

CLI Tools

Python

#machine-learning#inference#llm

huawei-noah/Efficient-AI-Backbones

Efficient AI model backbones developed by Huawei's Noah's Ark Lab, including GhostNet, TNT, and MLP.

4.4K

Experimental

Python

Computer Vision

Model Compression

PyTorch

#convolutional-neural-networks#efficient-inference#ghostnet

openvinotoolkit/open_model_zoo

A collection of pre-trained deep learning models and demos optimized for high performance using the OpenVINO toolkit.

4.4K

Active

Python

Inference

ML Ops

PyTorch

#deep-learning#model-zoo#openvino

OpenNMT/CTranslate2

Fast C++ inference engine for Transformer models, supporting CUDA, MKL, and other optimizations.

4.3K

Active

C++

Inference

API Frameworks

#deep-learning#machine-translation#neural-machine-translation

Lightricks/LTX-2

Python inference and LoRA trainer package for LTX-2 model

4.3K

Active

Python

Generative AI

PyTorch

#LTX-2#LoRA#Generative-AI

FedML-AI/FedML

A unified and scalable ML library for large-scale distributed training, model serving, and federated learning.

4.0K

Stable

Python

ML Ops

Inference

React

#ai#machine-learning#federated-learning

ModelTC/LightLLM

LightLLM is a high-performance, scalable Python-based framework for inference and serving of large language models.

3.9K

Active

Python

LLM Frameworks

API Frameworks

#llm#model-serving#deep-learning

skyzh/tiny-llm

A course on building a tiny vLLM (virtual Large Language Model) and Qwen inference serving on Apple Silicon for systems engineers.

3.9K

Stable

Python

LLM Frameworks

LLM Wrappers & SDKs

Python

#llm#qwen#vllm

NVIDIA/GenerativeAIExamples

Collection of generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

3.8K

Active

Jupyter Notebook

LLM Frameworks

Inference

React

#gpu-acceleration#large-language-models#microservice

Lightning-AI/LitServe

A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.

3.8K

Active

Python

AI Model Serving

API Frameworks

FastAPI

#ai#inference#serving

1...57...17

Stay in the loop

Get weekly updates on trending AI coding tools and projects.