Local Inference Engines

llama.cpp, whisper.cpp, GGML - inference engines for local hardware

Showing 1-20 of 22 projects

open-webui/open-webui

Self-hosted AI platform with Ollama and OpenAI API support

125.9K
Active
Python
MCP Frameworks
Agents & Orchestration
Docker
#ai-platform#ollama#openai-api

ggml-org/llama.cpp

Run LLMs locally in C/C++ with high performance

96.8K
Active
C++
Local Inference Engines
#llama.cpp#ggml#C++

meta-llama/llama

Llama 2 inference code for running Llama models

59.2K
Archived
Python
Inference
Local Inference Engines
#llama2#inference#ai-models

zylon-ai/private-gpt

PrivateGPT enables private document interaction with GPT without data leaks.

57.1K
Archived
Python
RAG & Vector
LLM Wrappers & SDKs
Python
#private-gpt#llm#rag

xai-org/grok-1

Open-source Grok-1 model for local inference with JAX

51.5K
Archived
Python
Inference
Local Inference Engines
JAX
#grok-1#llm#inference

mudler/LocalAI

Self-hosted, open-source AI alternative to OpenAI with local LLM inference, no GPU required

43.3K
Active
Go
MCP Servers
Local Inference Engines
Go
#local-ai#llm-inference#open-source

google/langextract

Extracts structured info from text using LLMs with source grounding

34.3K
Stable
Python
LLM Wrappers & SDKs
Local Inference Engines
Python
#llm#information-extraction#gemini

microsoft/BitNet

1-bit LLM inference framework for CPU/GPU

28.7K
Active
Python
Inference
Local Inference Engines
bitnet.cpp
#1-bit-llm#cpu-inference#gpu-inference

QwenLM/Qwen3

Qwen3 is Alibaba Cloud's large language model series with enhanced reasoning and coding capabilities.

26.8K
Stable
Python
LLM Frameworks
Local Inference Engines
Hugging Face
#large-language-model#llm#alibaba-cloud

black-forest-labs/flux

Official FLUX.1 inference repo for image generation & editing

25.3K
Experimental
Python
Inference
Local Inference Engines
PyTorch
#flux#image-generation#ai-inference

OpenBMB/MiniCPM-o

On-device multimodal LLM for vision, speech, and live streaming on phones

24.0K
Active
Python
Inference
Local Inference Engines
llama.cpp-omni
#minicpm-o#multimodal-llm#on-device-ai

SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2 for efficient speech-to-text

21.3K
Stable
Python
Inference
Local Inference Engines
CTranslate2
#speech-to-text#inference#quantization

bentoml/OpenLLM

Deploy open-source LLMs as OpenAI-compatible API endpoints using BentoML's model serving framework.

12.1K
Active
Python
AI Model Serving
Local Inference Engines
BentoML
#llm-inference#bentoml#model-serving

nullclaw/nullclaw

Autonomous AI assistant infrastructure in Zig—fast, minimal, self-contained runtime for building AI agents.

5.6K
Active
Zig
Local Inference Engines
LLM Frameworks
Zig
#zig-runtime#autonomous-ai#lightweight-inference

nearai/ironclaw

Rust implementation of OpenClaw focusing on privacy-preserving AI model execution and security hardening.

4.2K
Active
Rust
Local Inference Engines
AI SDKs & Wrappers
Rust
#privacy-preserving#rust-inference#openclaw

OHF-Voice/piper1-gpl

Fast local neural text-to-speech engine for offline voice synthesis

3.1K
Active
C++
Local Inference Engines
AI Voice & Speech
C++
#text-to-speech#tts#neural

Tencent-Hunyuan/HunyuanImage-3.0

Native multimodal model for high-quality image generation with text-to-image capabilities

2.9K
Active
Python
AI Image & Video
Local Inference Engines
PyTorch
#text-to-image#diffusion-model#multimodal

mostlygeek/llama-swap

Reliable model swapping for local LLM servers - seamlessly switch between llama.cpp, vLLM, and compatible backends

2.6K
Active
Go
Local Inference Engines
LLM Wrappers & SDKs
llama.cpp
#local-llm#model-swapping#llama-cpp

QwenLM/Qwen3.5

Large language model by Alibaba Cloud Qwen team for advanced NLP and AI applications

1.8K
Active
LLM Frameworks
Local Inference Engines
PyTorch
#large-language-model#qwen#llm

tnm/zclaw

Lightweight AI assistant for ESP32 microcontrollers with GPIO, scheduling, custom tools, and memory.

1.7K
Active
C
Local Inference Engines
Arduino & Embedded
ESP32
#esp32-ai#embedded-llm#edge-inference
2

Stay in the loop

Get weekly updates on trending AI coding tools and projects.