Showing 1-12 of 12 projects
A collection of multimodal large language models and their latest advances.
A powerful GUI agent family that enables mobile automation, multimodal interaction, and integration with AI tools.
A high-quality end-to-end speech interaction model for AI-powered voice applications.
A powerful multimodal AI model for real-time vision and speech interaction, built for developers who work with AI tools.
A modular multimodal large language model for advanced document understanding and analysis.
Cambrian-1 is a multimodal LLM with a vision-centric design for building AI-powered chatbots and applications.
A powerful text-to-image diffusion model that can be used for recaptioning, planning, and generating with multimodal LLMs.
A powerful vision-language foundation model designed to advance multimodal AI understanding and reasoning.
A novel Multimodal Large Language Model (MLLM) architecture for structurally aligning visual and textual embeddings.
Real-time voice interactive digital human with customizable appearance and voice, supporting voice cloning.
A curated collection of awesome unified multimodal models for text-to-image generation and vision-language tasks.
A family of lightweight multimodal models for chatGPT, GPT-4, and other large language models.
Get weekly updates on trending AI coding tools and projects.