Local Inference Engines

llama.cpp, whisper.cpp, GGML - inference engines for local hardware

Showing 1-20 of 22 projects

open-webui/open-webui

Self-hosted AI platform with Ollama and OpenAI API support

125.9K

Active

Python

MCP Frameworks

Agents & Orchestration

Docker

#ai-platform#ollama#openai-api

ggml-org/llama.cpp

Run LLMs locally in C/C++ with high performance

96.8K

Active

C++

Local Inference Engines

#llama.cpp#ggml#C++

meta-llama/llama

Llama 2 inference code for running Llama models

59.2K

Archived

Python

Inference

Local Inference Engines

#llama2#inference#ai-models

zylon-ai/private-gpt

PrivateGPT enables private document interaction with GPT without data leaks.

57.1K

Archived

Python

RAG & Vector

LLM Wrappers & SDKs

Python

#private-gpt#llm#rag

xai-org/grok-1

Open-source Grok-1 model for local inference with JAX

51.5K

Archived

Python

Inference

Local Inference Engines

JAX

#grok-1#llm#inference

mudler/LocalAI

Self-hosted, open-source AI alternative to OpenAI with local LLM inference, no GPU required

43.3K

Active

MCP Servers

Local Inference Engines

#local-ai#llm-inference#open-source

google/langextract

Extracts structured info from text using LLMs with source grounding

34.3K

Stable

Python

LLM Wrappers & SDKs

Local Inference Engines

Python

#llm#information-extraction#gemini

microsoft/BitNet

1-bit LLM inference framework for CPU/GPU

28.7K

Active

Python

Inference

Local Inference Engines

bitnet.cpp

#1-bit-llm#cpu-inference#gpu-inference

QwenLM/Qwen3

Qwen3 is Alibaba Cloud's large language model series with enhanced reasoning and coding capabilities.

26.8K

Stable

Python

LLM Frameworks

Local Inference Engines

Hugging Face

#large-language-model#llm#alibaba-cloud

black-forest-labs/flux

Official FLUX.1 inference repo for image generation & editing

25.3K

Experimental

Python

Inference

Local Inference Engines

PyTorch

#flux#image-generation#ai-inference

OpenBMB/MiniCPM-o

On-device multimodal LLM for vision, speech, and live streaming on phones

24.0K

Active

Python

Inference

Local Inference Engines

llama.cpp-omni

#minicpm-o#multimodal-llm#on-device-ai

SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2 for efficient speech-to-text

21.3K

Stable

Python

Inference

Local Inference Engines

CTranslate2

#speech-to-text#inference#quantization

bentoml/OpenLLM

Deploy open-source LLMs as OpenAI-compatible API endpoints using BentoML's model serving framework.

12.1K

Active

Python

AI Model Serving

Local Inference Engines

BentoML

#llm-inference#bentoml#model-serving

nullclaw/nullclaw

Autonomous AI assistant infrastructure in Zig—fast, minimal, self-contained runtime for building AI agents.

5.6K

Active

Zig

Local Inference Engines

LLM Frameworks

Zig

#zig-runtime#autonomous-ai#lightweight-inference

nearai/ironclaw

Rust implementation of OpenClaw focusing on privacy-preserving AI model execution and security hardening.

4.2K

Active

Rust

Local Inference Engines

AI SDKs & Wrappers

Rust

#privacy-preserving#rust-inference#openclaw

OHF-Voice/piper1-gpl

Fast local neural text-to-speech engine for offline voice synthesis

3.1K

Active

C++

Local Inference Engines

AI Voice & Speech

C++

#text-to-speech#tts#neural

Tencent-Hunyuan/HunyuanImage-3.0

Native multimodal model for high-quality image generation with text-to-image capabilities

2.9K

Active

Python

AI Image & Video

Local Inference Engines

PyTorch

#text-to-image#diffusion-model#multimodal

mostlygeek/llama-swap

Reliable model swapping for local LLM servers - seamlessly switch between llama.cpp, vLLM, and compatible backends

2.6K

Active

Local Inference Engines

LLM Wrappers & SDKs

llama.cpp

#local-llm#model-swapping#llama-cpp

QwenLM/Qwen3.5

Large language model by Alibaba Cloud Qwen team for advanced NLP and AI applications

1.8K

Active

LLM Frameworks

Local Inference Engines

PyTorch

#large-language-model#qwen#llm

tnm/zclaw

Lightweight AI assistant for ESP32 microcontrollers with GPIO, scheduling, custom tools, and memory.

1.7K

Active

Local Inference Engines

Arduino & Embedded

ESP32

#esp32-ai#embedded-llm#edge-inference

Stay in the loop

Get weekly updates on trending AI coding tools and projects.