Showing 1-4 of 4 projects
Janus-Series: Unified Multimodal Understanding and Generation Models for AI-powered vibe coders.
LAVIS is a comprehensive library for multimodal deep learning, including image captioning, visual question answering, and more.
An open-source, instruction-tuned audio-visual language model for video understanding
A video conversation model that combines LLM capabilities with pretrained visual encoders for video-based chatbots.
Get weekly updates on trending AI coding tools and projects.