Showing 1-17 of 17 projects
Comprehensive guide for Chinese developers to deploy and fine-tune open-source LLMs on Linux
On-device multimodal LLM for vision, speech, and live streaming on phones
Microsoft's research repo for large-scale self-supervised pre-training across tasks, languages, and modalities
An open agentic framework that enables computers to act like humans
A powerful GUI agent family that enables mobile automation, multimodal interaction, and integration with AI tools.
MagicQuill is an intelligent interactive image editing system powered by AI for CVPR'25.
Code and models for a multimodal large language model that can perform any-to-any tasks
A comprehensive multimodal system for long-term streaming video and audio interactions using large language models.
A powerful multimodal AI model for real-time vision and speech interaction, built for developers who work with AI tools.
A modular multimodal large language model for advanced document understanding and analysis.
Cambrian-1 is a multimodal LLM with a vision-centric design for building AI-powered chatbots and applications.
A curated collection of YOLO object detection projects and datasets for developers working with computer vision and AI.
A novel Multimodal Large Language Model (MLLM) architecture for structurally aligning visual and textual embeddings.
Fast Multimodal LLM on Mobile Devices: A C++ library for running large language models on mobile devices
Mulberry is an o1-like Reasoning and Reflection MLLM implemented via Collective MCTS for AI-powered coding tools.
A next-generation object detection model that can detect anything with high accuracy and efficiency.
A family of lightweight multimodal models for chatGPT, GPT-4, and other large language models.
Get weekly updates on trending AI coding tools and projects.