Explore Projects

Discover 178 open source projects

Active filters (1):

Search: multimodality×

Clear all

Showing 161-178 of 178 projects

DAMO-NLP-SG/VideoLLaMA3

A frontier multimodal foundation model for advanced image and video understanding tasks.

1.1K

Stable

Jupyter Notebook

Computer Vision

LLM Frameworks

Jupyter Notebook

#computer-vision#multimodal-learning#foundation-models

TommyZihao/vlm_arm

A project exploring human-machine collaboration using a robotic arm, large language models, and multimodal AI.

1.1K

Experimental

Jupyter Notebook

Agents & Orchestration

Robotics

#robotics#multimodal-ai#human-machine-collaboration

google-research-datasets/wit

A large multimodal multilingual dataset of image-text pairs from Wikipedia for machine learning research.

1.1K

Archived

Multimodal

NLP

#machine-learning#nlp#multimodal

EliasOenal/multimon-ng

This C-based multimode monitor project supports various digital radio protocols.

1.1K

Active

CLI Tools

Arduino & Embedded

#digital-radio#signal-processing#embedded-systems

ShareGPT4Omni/ShareGPT4Video

An official implementation of a system for improving video understanding and generation with better captions.

1.1K

Archived

Python

LLM Frameworks

Computer Vision

PyTorch

#chatgpt#gpt-4#computer-vision

rhymes-ai/Aria

Aria is an open-source multimodal AI framework for building vision and language models.

1.1K

Archived

Jupyter Notebook

Agents & Orchestration

Computer Vision

Jupyter Notebook

#multimodal#vision-and-language#mixture-of-experts

HITsz-TMG/Uni-MoE

Uni-MoE is a large multimodal model family from Lychee, a Python library for AI model development and deployment.

1.1K

Stable

Python

LLM Frameworks

LLM Wrappers & SDKs

Python

#multimodal-models#large-language-models#ai-development

maelfabien/Multimodal-Emotion-Recognition

A real-time multimodal emotion recognition web app for text, sound, and video inputs.

1.1K

Archived

Jupyter Notebook

Emotion Recognition

Frontend Frameworks

Keras

#emotion-analysis#emotion-detection#real-time

OpenBMB/VisCPM

A multimodal large language model series for Chinese and English AI-powered coding and painting tools.

1.1K

Archived

Python

LLM Frameworks

Multimodal

PyTorch

#diffusion-models#large-language-models#transformers

AILab-CVC/UniRepLKNet

A CVPR 2024 and TPAMI 2025 AI-powered multimodal learning architecture for vibe coders.

1.1K

Stable

Python

Computer Vision

Multimodal Learning

PyTorch

#architecture#artificial-intelligence#convolutional-neural-networks

OFA-Sys/ONE-PEACE

A general representation model for cross-modal learning across vision, audio, and language.

1.1K

Archived

Python

LLM Frameworks

Representation Learning

Python

#multimodal#contrastive-learning#foundation-models

WangRongsheng/XrayGLM

An open-source Chinese medical multimodal model that can summarize chest radiographs.

1.1K

Archived

Python

LLM Frameworks

API Frameworks

Python

#medical#multimodal#llm

BAAI-DCAI/Bunny

A family of lightweight multimodal models for chatGPT, GPT-4, and other large language models.

1.1K

Archived

Python

LLM Frameworks

LLM Wrappers & SDKs

Python

#chatgpt#gpt-4#multimodal

ashkamath/mdetr

An image-text multimodal deep learning model for object detection and recognition.

1.0K

Archived

Python

Computer Vision

LLM Frameworks

PyTorch

#computer-vision#object-detection#multimodal-learning

MinorJerry/WebVoyager

An end-to-end web agent built with large multimodal models, enabling AI-powered web browsing and automation.

1.0K

Archived

Python

Agents & Orchestration

Full-Stack Frameworks

Python

#ai-agent#multimodal-models#web-automation

ArrowLuo/CLIP4Clip

An official implementation of CLIP4Clip, a model for end-to-end video clip retrieval.

1.0K

Archived

Python

Computer Vision

Multimodal Learning

Python

#multimodal#video-retrieval#clip

PreferredAI/cornac

A comparative framework for building multimodal recommender systems using collaborative filtering and matrix factorization.

1.0K

Stable

Python

Recommender System

Machine Learning Frameworks

Python

#collaborative-filtering#matrix-factorization#multimodal-learning

declare-lab/MELD

A multimodal dataset for emotion recognition in conversation, useful for building conversational AI and chatbots.

1.0K

Archived

Python

Computer Vision

Agents & Orchestration

Python

#emotion-recognition#multimodal-interactions#dialogue-systems

1...8

Stay in the loop

Get weekly updates on trending AI coding tools and projects.