Showing 1-3 of 3 projects
LLaVA is a visual instruction tuning framework for large language and vision models, enabling GPT-4 level capabilities.
Code and models for a multimodal large language model that can perform any-to-any tasks
A comprehensive multimodal system for long-term streaming video and audio interactions using large language models.
Get weekly updates on trending AI coding tools and projects.