Explore Projects

Discover 37 open source projects

Active filters (1):
Search: llm-inferenceร—
Clear all

Showing 1-20 of 37 projects

nomic-ai/gpt4all

Run local LLMs on any device with GPT4All

77.2K
Experimental
C++
Desktop Model Runners
#llm#local-ai#model-runner

ray-project/ray

Ray is a unified framework for scaling AI and Python applications with distributed computing and ML libraries.

41.6K
Active
Python
ML Ops
Containerization
Python
#distributed-computing#ml-ops#ai-framework

gitleaks/gitleaks

Detect secrets in git repos and files

25.2K
Active
Go
Security Research
CLI Tools
#gitleaks#secret-detection#ci-cd

liguodongiot/llm-action

Comprehensive LLM engineering and application resources with training, inference, compression, and deployment guides

23.4K
Stable
HTML
Fine-tuning
Inference
#llm-training#llm-inference#llm-ops

Lightning-AI/litgpt

A collection of high-performance large language models (LLMs) with recipes to pretrain, finetune, and deploy at scale.

13.2K
Active
Python
LLM Frameworks
Python
#ai#artificial-intelligence#large-language-models

bentoml/OpenLLM

Deploy open-source LLMs as OpenAI-compatible API endpoints using BentoML's model serving framework.

12.1K
Active
Python
AI Model Serving
Local Inference Engines
BentoML
#llm-inference#bentoml#model-serving

mistralai/mistral-inference

Official inference library for Mistral models, a platform for building AI-powered applications.

10.7K
Stable
Jupyter Notebook
LLM Frameworks
React
#llm#llm-inference#mistralai

openvinotoolkit/openvino

OpenVINO is an open-source toolkit for optimizing and deploying AI inference on a variety of hardware.

9.8K
Active
C++
Inference
#ai#computer-vision#deep-learning

Tiiny-AI/PowerInfer

High-performance C++ library for fast local deployment of large language models (LLMs) like LLaMA.

8.8K
Active
C++
LLM Frameworks
API Frameworks
#llm#llm-inference#local-inference

bentoml/BentoML

BentoML is an easy-to-use framework for building and deploying production-ready machine learning models as APIs.

8.5K
Active
Python
LLM Frameworks
API Clients & Testing
Python
#ai-inference#llm-inference#llm-serving

InternLM/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving large language models (LLMs).

7.7K
Active
Python
LLM Frameworks
Inference
Python
#llm#inference#deployment

katanemo/plano

Delivers infrastructure for agentic apps with AI-native proxy and data plane.

5.9K
Active
Rust
Rust
#proxy#gateway#LLM

algorithmicsuperintelligence/openevolve

Open-source implementation of AlphaEvolve, a coding agent for iterative code optimization and discovery.

5.5K
Active
Python
Agents & Orchestration
AI Coding Agents
Python
#alpha-evolve#coding-agent#llm-engineering

superduper-io/superduper

Superduper is an end-to-end framework for building custom AI applications and agents using Python, PyTorch, and Transformers.

5.3K
Stable
Python
LLM Frameworks
Agents & Orchestration
PyTorch
#ai#chatbot#mlops

flashinfer-ai/flashinfer

A Python library for serving large language models (LLMs) with high performance, including GPU acceleration and distributed inference.

5.1K
Active
Python
LLM Frameworks
Inference
PyTorch
#llm#inference#cuda

xlite-dev/Awesome-LLM-Inference

A curated list of awesome papers and code for optimizing LLM/VLM inference performance

5.0K
Active
Python
LLM Frameworks
LLM Wrappers & SDKs
#llm#inference#optimization

FellouAI/eko

Eko is an agentic framework that helps developers build production-ready AI-powered workflows with natural language interactions.

4.9K
Active
TypeScript
Agents & Orchestration
LLM Frameworks
TypeScript
#agent#agentic-ai#natural-language-inference

gpustack/gpustack

Optimize AI inference performance on GPUs with this Python library for selecting and tuning inference engines.

4.6K
Active
Python
Inference
CLI Tools
Python
#ai-inference#gpu-acceleration#performance-optimization

NVIDIA/GenerativeAIExamples

Collection of generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

3.8K
Active
Jupyter Notebook
LLM Frameworks
Inference
React
#gpu-acceleration#large-language-models#microservice

Michael-A-Kuykendall/shimmy

A free, open-source Rust inference server compatible with OpenAI-API, suitable for vibe coders

3.7K
Active
Rust
React
#authentication#inference-server#open-source
2

Stay in the loop

Get weekly updates on trending AI coding tools and projects.