Showing 1-13 of 13 projects
OpenMMLab's toolbox and benchmark for advanced video understanding and action recognition.
A curated list of action recognition and related area resources for developers working with computer vision and video processing.
An open-source project that enables developers to build chatbots with video understanding using large language models.
A video foundation model and dataset for multimodal understanding and video understanding tasks.
A scalable multimodal reasoning framework for AI-powered applications with a focus on video and image understanding.
A highly efficient module for temporal modeling in video understanding tasks.
An open-source toolbox for action understanding based on PyTorch, focused on computer vision and video analysis.
A self-supervised video representation learning model for video understanding tasks.
PaddleVideo is a powerful toolkit for video understanding tasks like action recognition, localization, and detection.
Code and models for Temporal Segment Networks (TSN) for action recognition in video understanding.
SALMONN is a suite of advanced multi-modal large language models (LLMs) for audio, speech, and video understanding.
A curated list of research papers on visual grounding, a key technique for multimodal AI.
A PyTorch implementation of Temporal Segment Networks (TSN) for video understanding and action recognition.
Get weekly updates on trending AI coding tools and projects.