Explore Projects

Discover 13 open source projects

Active filters (1):

Search: video-understanding×

Clear all

Showing 1-13 of 13 projects

open-mmlab/mmaction2

OpenMMLab's toolbox and benchmark for advanced video understanding and action recognition.

4.9K

Archived

Python

Computer Vision

API Frameworks

PyTorch

#video-classification#action-recognition#benchmark

jinwchoi/awesome-action-recognition

A curated list of action recognition and related area resources for developers working with computer vision and video processing.

4.0K

Archived

Computer Vision

Documentation

#action-recognition#video-processing#computer-vision

OpenGVLab/Ask-Anything

An open-source project that enables developers to build chatbots with video understanding using large language models.

3.3K

Archived

Python

React

#chatbots#video-understanding#large-language-models

OpenGVLab/InternVideo

A video foundation model and dataset for multimodal understanding and video understanding tasks.

2.2K

Stable

Python

Computer Vision

Datasets

PyTorch

#video-understanding#multimodal#foundation-models

zai-org/GLM-V

A scalable multimodal reasoning framework for AI-powered applications with a focus on video and image understanding.

2.2K

Active

Python

LLM Frameworks

Agents & Orchestration

Python

#multimodal-reasoning#video-understanding#image-to-text

mit-han-lab/temporal-shift-module

A highly efficient module for temporal modeling in video understanding tasks.

2.2K

Archived

Python

Computer Vision

API Frameworks

PyTorch

#acceleration#efficient-model#low-latency

open-mmlab/mmaction

An open-source toolbox for action understanding based on PyTorch, focused on computer vision and video analysis.

1.9K

Archived

Python

Computer Vision

API Frameworks

PyTorch

#action-detection#action-recognition#video-understanding

MCG-NJU/VideoMAE

A self-supervised video representation learning model for video understanding tasks.

1.7K

Archived

Python

Computer Vision

API Frameworks

PyTorch

#video-analysis#video-understanding#self-supervised-learning

PaddlePaddle/PaddleVideo

PaddleVideo is a powerful toolkit for video understanding tasks like action recognition, localization, and detection.

1.7K

Experimental

Python

Computer Vision

API Frameworks

Python

#video-recognition#action-detection#action-localization

yjxiong/temporal-segment-networks

Code and models for Temporal Segment Networks (TSN) for action recognition in video understanding.

1.6K

Archived

Python

Computer Vision

#action-recognition#temporal-segment-networks#video-understanding

bytedance/SALMONN

SALMONN is a suite of advanced multi-modal large language models (LLMs) for audio, speech, and video understanding.

1.4K

Stable

LLM Frameworks

Speech Recognition

#audio-processing#speech-recognition#video-understanding

TheShadow29/awesome-grounding

A curated list of research papers on visual grounding, a key technique for multimodal AI.

1.1K

Stable

Computer Vision

Language Grounding

#computer-vision#language-grounding#multimodal-ai

yjxiong/tsn-pytorch

A PyTorch implementation of Temporal Segment Networks (TSN) for video understanding and action recognition.

1.1K

Archived

Python

Computer Vision

API Frameworks

PyTorch

#action-recognition#video-understanding#deep-learning

Stay in the loop

Get weekly updates on trending AI coding tools and projects.