Showing 1-6 of 6 projects
A fast and accurate object detection method with new technologies like NAS backbones and efficient RepGFPN.
An open-source, instruction-tuned audio-visual language model for video understanding
A powerful multi-modal large language model family for building advanced AI chatbots and visual recognition models.
Official codebase for Alibaba DAMO Conversational AI, a deep learning-powered dialog system.
VideoLLaMA 2 is a Python library that advances spatial-temporal modeling and audio understanding in video-based large language models.
A frontier multimodal foundation model for advanced image and video understanding tasks.
Get weekly updates on trending AI coding tools and projects.