Explore Projects

Discover 321 open source projects

Active filters (1):
Search: inference×
Clear all

Showing 101-120 of 321 projects

tencentmusic/cube-studio

An open-source cloud-native AI platform for ML/DL workflows, model serving, and distributed training.

4.9K
Stable
Python
MLOps
BaaS Platforms
PyTorch
#ai-platform#mlops#model-serving

NVIDIA-AI-IOT/torch2trt

An easy-to-use PyTorch to TensorRT converter for optimizing AI model inference on NVIDIA Jetson devices.

4.9K
Archived
Python
Inference
API Frameworks
PyTorch
#pytorch#tensorrt#jetson

blei-lab/edward

Edward is a probabilistic programming language in TensorFlow for deep generative models and variational inference.

4.8K
Archived
Jupyter Notebook
LLM Frameworks
Data Science
TensorFlow
#bayesian-methods#deep-learning#neural-networks

facebookincubator/AITemplate

AITemplate is a Python framework for rendering neural networks into high-performance CUDA/HIP C++ code, optimized for GPU inference.

4.7K
Active
Python
Inference
ML Ops
Python
#cuda#hip#c++

vllm-project/aibrix

Infrastructure components for cost-efficient GenAI model inference and deployment

4.7K
Active
Go
AI Model Serving
Infrastructure as Code
Go
#llm-inference#genai-infrastructure#model-serving

gpustack/gpustack

Optimize AI inference performance on GPUs with this Python library for selecting and tuning inference engines.

4.6K
Active
Python
Inference
CLI Tools
Python
#ai-inference#gpu-acceleration#performance-optimization

huggingface/text-embeddings-inference

A blazing fast inference solution for text embeddings models built with Rust.

4.6K
Active
Rust
LLM Frameworks
Inference
#embeddings#inference#text-processing

py-why/EconML

A toolkit for automated causal inference and econometric analysis, combining machine learning and econometrics.

4.5K
Active
Jupyter Notebook
ML Ops
Caching
Python
#causal-inference#econometrics#machine-learning

BlockRunAI/ClawRouter

Smart LLM router for AI inference cost optimization

4.5K
Active
TypeScript
AI Editors/Agents/Copilot
#LLM#AI Routing#Inference Cost Optimization

plasma-umass/coz

Coz is a causal profiler for C/C++ that helps developers optimize performance by identifying bottlenecks.

4.5K
Active
C
#causal-profiling#performance-analysis#performance-optimization

turboderp-org/exllamav2

A fast inference library for running large language models (LLMs) locally on modern GPUs

4.5K
Stable
Python
LLM Frameworks
CLI Tools
Python
#machine-learning#inference#llm

huawei-noah/Efficient-AI-Backbones

Efficient AI model backbones developed by Huawei's Noah's Ark Lab, including GhostNet, TNT, and MLP.

4.4K
Experimental
Python
Computer Vision
Model Compression
PyTorch
#convolutional-neural-networks#efficient-inference#ghostnet

openvinotoolkit/open_model_zoo

A collection of pre-trained deep learning models and demos optimized for high performance using the OpenVINO toolkit.

4.4K
Active
Python
Inference
ML Ops
PyTorch
#deep-learning#model-zoo#openvino

OpenNMT/CTranslate2

Fast C++ inference engine for Transformer models, supporting CUDA, MKL, and other optimizations.

4.3K
Active
C++
Inference
API Frameworks
#deep-learning#machine-translation#neural-machine-translation

Lightricks/LTX-2

Python inference and LoRA trainer package for LTX-2 model

4.3K
Active
Python
Generative AI
PyTorch
#LTX-2#LoRA#Generative-AI

FedML-AI/FedML

A unified and scalable ML library for large-scale distributed training, model serving, and federated learning.

4.0K
Stable
Python
ML Ops
Inference
React
#ai#machine-learning#federated-learning

ModelTC/LightLLM

LightLLM is a high-performance, scalable Python-based framework for inference and serving of large language models.

3.9K
Active
Python
LLM Frameworks
API Frameworks
#llm#model-serving#deep-learning

skyzh/tiny-llm

A course on building a tiny vLLM (virtual Large Language Model) and Qwen inference serving on Apple Silicon for systems engineers.

3.9K
Stable
Python
LLM Frameworks
LLM Wrappers & SDKs
Python
#llm#qwen#vllm

NVIDIA/GenerativeAIExamples

Collection of generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

3.8K
Active
Jupyter Notebook
LLM Frameworks
Inference
React
#gpu-acceleration#large-language-models#microservice

Lightning-AI/LitServe

A minimal Python framework for building custom AI inference servers with full control over logic, batching, and scaling.

3.8K
Active
Python
AI Model Serving
API Frameworks
FastAPI
#ai#inference#serving
1...57...17

Stay in the loop

Get weekly updates on trending AI coding tools and projects.