Showing 1-5 of 5 projects
A collection of multimodal large language models and their latest advances.
A large-scale vision-language model for video understanding and generation.
A comprehensive multimodal system for long-term streaming video and audio interactions using large language models.
A curated list of resources for leveraging visual information in large vision-language models (LVLMs) for complex reasoning, planning, and generation.
An official implementation of a system for improving video understanding and generation with better captions.
Get weekly updates on trending AI coding tools and projects.