Showing 1-4 of 4 projects
A powerful multimodal AI model for real-time vision and speech interaction, built for developers who work with AI tools.
An open-source framework for AI-powered process automation with support for large language, action, and multimodal models.
An implementation for detailed localized image and video captioning using large multimodal models.
An official implementation of a system for improving video understanding and generation with better captions.
Get weekly updates on trending AI coding tools and projects.