Showing 1-17 of 17 projects
High-throughput LLM inference engine for developers
BentoML is an easy-to-use framework for building and deploying production-ready machine learning models as APIs.
A repository sharing notes and references on deploying deep learning models in production.
Olares is an open-source personal cloud platform to reclaim your data and enable local AI computing.
A unified and scalable ML library for large-scale distributed training, model serving, and federated learning.
LightLLM is a high-performance, scalable Python-based framework for inference and serving of large language models.
A comprehensive collection of resources and tutorials for building AI infrastructure and systems.
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
A Python framework for efficient model inference with omni-modality AI models.
Reproducible development environment for humans and agents
A community-maintained hardware plugin for running large language models (LLMs) on Ascend accelerators.
MLRun is an open-source MLOps platform for building and managing continuous ML applications.
An open-source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifact.
Hopsworks is a feature store and MLOps platform for data-intensive AI and machine learning applications.
The simplest way to serve AI/ML models in production, with support for popular models like Stable Diffusion and Whisper.
RTP-LLM is a high-performance LLM inference engine from Alibaba for diverse AI applications.
Get weekly updates on trending AI coding tools and projects.