Showing 1-20 of 178 projects
Model framework for state-of-the-art ML models in text, vision, audio, and multimodal tasks.
All-in-one AI app for local and remote LLM usage with RAG, agents, and MCP compatibility
Multimodal AI agent stack for GUI and browser automation
LLaVA is a visual instruction tuning framework for large language and vision models, enabling GPT-4 level capabilities.
High-performance serving framework for large language and multimodal models
On-device multimodal LLM for vision, speech, and live streaming on phones
Microsoft's research repo for large-scale self-supervised pre-training across tasks, languages, and modalities
Build and deploy AI services with cloud-native stack
Multimodal large language model series for developers
Janus-Series: Unified Multimodal Understanding and Generation Models for AI-powered vibe coders.
A collection of multimodal large language models and their latest advances.
An open-source tool for recording screens and microphones, designed for developers building with AI tools.
A scalable generative AI framework for researchers and developers
A fast, lightweight deep learning framework used in Alibaba's business-critical use cases, supporting LLM and 3D avatar apps.
A Python library for using and fine-tuning over 900 large language models and multimodal models for various AI tasks.
An open-source AI avatar toolkit for offline video generation and digital human cloning.
LAVIS is a comprehensive library for multimodal deep learning, including image captioning, visual question answering, and more.
An open-source framework for building voice and multimodal conversational AI applications.
An open source SDK for logging, storing, querying, and visualizing multimodal and multi-rate data.
Production ready AI toolkit for local AI inference
Get weekly updates on trending AI coding tools and projects.