Showing 21-40 of 321 projects
High-performance serving framework for large language and multimodal models
Minimal PyTorch GPT re-implementation for training and inference
Comprehensive LLM engineering and application resources with training, inference, compression, and deployment guides
High-performance mobile-optimized neural network inference framework for deploying AI models on mobile devices
Faster Whisper transcription with CTranslate2 for efficient speech-to-text
Multilingual voice generation model with full-stack capabilities for TTS, training, and deployment
A single-file C implementation of Llama 2 for efficient large language model inference
A repository for running inference with the Meta Segment Anything Model 2 (SAM 2) and example notebooks.
This is a comprehensive guide for building with the LLaMA language model, covering inference, fine-tuning, and end-to-end solutions.
High-performance in-browser LLM inference engine for building AI-powered web applications and tools.
An open-source machine learning engineering reference with resources for training, deploying, and scaling AI models.
A flexible framework for optimizing heterogeneous LLM inference and fine-tuning workflows.
Inference code for CodeLlama models, a developer platform focused on AI-powered coding tools and workflows.
A comprehensive machine learning library in Python with implementations of various algorithms and models.
A static analyzer for Java, C, C++, and Objective-C written in OCaml.
A powerful pattern matching library for TypeScript with smart type inference to simplify control flow.
A comprehensive collection of free LLM inference resources accessible via API for AI developers.
AirLLM 70B inference with single 4GB GPU
A collection of high-performance large language models (LLMs) with recipes to pretrain, finetune, and deploy at scale.
TensorRT LLM provides a Python API and optimizations to efficiently run large language models on NVIDIA GPUs.
Get weekly updates on trending AI coding tools and projects.