Explore Projects

Discover 37 open source projects

Active filters (1):

Search: llm-inference×

Clear all

Showing 1-20 of 37 projects

nomic-ai/gpt4all

Run local LLMs on any device with GPT4All

77.2K

Experimental

C++

Desktop Model Runners

#llm#local-ai#model-runner

ray-project/ray

Ray is a unified framework for scaling AI and Python applications with distributed computing and ML libraries.

41.6K

Active

Python

ML Ops

Containerization

Python

#distributed-computing#ml-ops#ai-framework

gitleaks/gitleaks

Detect secrets in git repos and files

25.2K

Active

Security Research

CLI Tools

#gitleaks#secret-detection#ci-cd

liguodongiot/llm-action

Comprehensive LLM engineering and application resources with training, inference, compression, and deployment guides

23.4K

Stable

HTML

Fine-tuning

Inference

#llm-training#llm-inference#llm-ops

Lightning-AI/litgpt

A collection of high-performance large language models (LLMs) with recipes to pretrain, finetune, and deploy at scale.

13.2K

Active

Python

LLM Frameworks

Python

#ai#artificial-intelligence#large-language-models

bentoml/OpenLLM

Deploy open-source LLMs as OpenAI-compatible API endpoints using BentoML's model serving framework.

12.1K

Active

Python

AI Model Serving

Local Inference Engines

BentoML

#llm-inference#bentoml#model-serving

mistralai/mistral-inference

Official inference library for Mistral models, a platform for building AI-powered applications.

10.7K

Stable

Jupyter Notebook

LLM Frameworks

React

#llm#llm-inference#mistralai

openvinotoolkit/openvino

OpenVINO is an open-source toolkit for optimizing and deploying AI inference on a variety of hardware.

9.8K

Active

C++

Inference

#ai#computer-vision#deep-learning

Tiiny-AI/PowerInfer

High-performance C++ library for fast local deployment of large language models (LLMs) like LLaMA.

8.8K

Active

C++

LLM Frameworks

API Frameworks

#llm#llm-inference#local-inference

bentoml/BentoML

BentoML is an easy-to-use framework for building and deploying production-ready machine learning models as APIs.

8.5K

Active

Python

LLM Frameworks

API Clients & Testing

Python

#ai-inference#llm-inference#llm-serving

InternLM/lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving large language models (LLMs).

7.7K

Active

Python

LLM Frameworks

Inference

Python

#llm#inference#deployment

katanemo/plano

Delivers infrastructure for agentic apps with AI-native proxy and data plane.

5.9K

Active

Rust

#proxy#gateway#LLM

algorithmicsuperintelligence/openevolve

Open-source implementation of AlphaEvolve, a coding agent for iterative code optimization and discovery.

5.5K

Active

Python

Agents & Orchestration

AI Coding Agents

Python

#alpha-evolve#coding-agent#llm-engineering

superduper-io/superduper

Superduper is an end-to-end framework for building custom AI applications and agents using Python, PyTorch, and Transformers.

5.3K

Stable

Python

LLM Frameworks

Agents & Orchestration

PyTorch

#ai#chatbot#mlops

flashinfer-ai/flashinfer

A Python library for serving large language models (LLMs) with high performance, including GPU acceleration and distributed inference.

5.1K

Active

Python

LLM Frameworks

Inference

PyTorch

#llm#inference#cuda

xlite-dev/Awesome-LLM-Inference

A curated list of awesome papers and code for optimizing LLM/VLM inference performance

5.0K

Active

Python

LLM Frameworks

LLM Wrappers & SDKs

#llm#inference#optimization

FellouAI/eko

Eko is an agentic framework that helps developers build production-ready AI-powered workflows with natural language interactions.

4.9K

Active

TypeScript

Agents & Orchestration

LLM Frameworks

TypeScript

#agent#agentic-ai#natural-language-inference

gpustack/gpustack

Optimize AI inference performance on GPUs with this Python library for selecting and tuning inference engines.

4.6K

Active

Python

Inference

CLI Tools

Python

#ai-inference#gpu-acceleration#performance-optimization

NVIDIA/GenerativeAIExamples

Collection of generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

3.8K

Active

Jupyter Notebook

LLM Frameworks

Inference

React

#gpu-acceleration#large-language-models#microservice

Michael-A-Kuykendall/shimmy

A free, open-source Rust inference server compatible with OpenAI-API, suitable for vibe coders

3.7K

Active

Rust

React

#authentication#inference-server#open-source

Stay in the loop

Get weekly updates on trending AI coding tools and projects.