Explore Projects

Discover 13 open source projects

Active filters (1):
Search: video-understandingร—
Clear all

Showing 1-13 of 13 projects

open-mmlab/mmaction2

OpenMMLab's toolbox and benchmark for advanced video understanding and action recognition.

4.9K
Archived
Python
Computer Vision
API Frameworks
PyTorch
#video-classification#action-recognition#benchmark

jinwchoi/awesome-action-recognition

A curated list of action recognition and related area resources for developers working with computer vision and video processing.

4.0K
Archived
Computer Vision
Documentation
#action-recognition#video-processing#computer-vision

OpenGVLab/Ask-Anything

An open-source project that enables developers to build chatbots with video understanding using large language models.

3.3K
Archived
Python
React
#chatbots#video-understanding#large-language-models

OpenGVLab/InternVideo

A video foundation model and dataset for multimodal understanding and video understanding tasks.

2.2K
Stable
Python
Computer Vision
Datasets
PyTorch
#video-understanding#multimodal#foundation-models

zai-org/GLM-V

A scalable multimodal reasoning framework for AI-powered applications with a focus on video and image understanding.

2.2K
Active
Python
LLM Frameworks
Agents & Orchestration
Python
#multimodal-reasoning#video-understanding#image-to-text

mit-han-lab/temporal-shift-module

A highly efficient module for temporal modeling in video understanding tasks.

2.2K
Archived
Python
Computer Vision
API Frameworks
PyTorch
#acceleration#efficient-model#low-latency

open-mmlab/mmaction

An open-source toolbox for action understanding based on PyTorch, focused on computer vision and video analysis.

1.9K
Archived
Python
Computer Vision
API Frameworks
PyTorch
#action-detection#action-recognition#video-understanding

MCG-NJU/VideoMAE

A self-supervised video representation learning model for video understanding tasks.

1.7K
Archived
Python
Computer Vision
API Frameworks
PyTorch
#video-analysis#video-understanding#self-supervised-learning

PaddlePaddle/PaddleVideo

PaddleVideo is a powerful toolkit for video understanding tasks like action recognition, localization, and detection.

1.7K
Experimental
Python
Computer Vision
API Frameworks
Python
#video-recognition#action-detection#action-localization

yjxiong/temporal-segment-networks

Code and models for Temporal Segment Networks (TSN) for action recognition in video understanding.

1.6K
Archived
Python
Computer Vision
#action-recognition#temporal-segment-networks#video-understanding

bytedance/SALMONN

SALMONN is a suite of advanced multi-modal large language models (LLMs) for audio, speech, and video understanding.

1.4K
Stable
LLM Frameworks
Speech Recognition
#audio-processing#speech-recognition#video-understanding

TheShadow29/awesome-grounding

A curated list of research papers on visual grounding, a key technique for multimodal AI.

1.1K
Stable
Computer Vision
Language Grounding
#computer-vision#language-grounding#multimodal-ai

yjxiong/tsn-pytorch

A PyTorch implementation of Temporal Segment Networks (TSN) for video understanding and action recognition.

1.1K
Archived
Python
Computer Vision
API Frameworks
PyTorch
#action-recognition#video-understanding#deep-learning

Stay in the loop

Get weekly updates on trending AI coding tools and projects.