Explore Projects

Discover 178 open source projects

Active filters (1):
Search: multimodalร—
Clear all

Showing 161-178 of 178 projects

DAMO-NLP-SG/VideoLLaMA3

A frontier multimodal foundation model for advanced image and video understanding tasks.

1.1K
Stable
Jupyter Notebook
Computer Vision
LLM Frameworks
Jupyter Notebook
#computer-vision#multimodal-learning#foundation-models

TommyZihao/vlm_arm

A project exploring human-machine collaboration using a robotic arm, large language models, and multimodal AI.

1.1K
Experimental
Jupyter Notebook
Agents & Orchestration
Robotics
#robotics#multimodal-ai#human-machine-collaboration

google-research-datasets/wit

A large multimodal multilingual dataset of image-text pairs from Wikipedia for machine learning research.

1.1K
Archived
Multimodal
NLP
#machine-learning#nlp#multimodal

EliasOenal/multimon-ng

This C-based multimode monitor project supports various digital radio protocols.

1.1K
Active
C
CLI Tools
Arduino & Embedded
#digital-radio#signal-processing#embedded-systems

ShareGPT4Omni/ShareGPT4Video

An official implementation of a system for improving video understanding and generation with better captions.

1.1K
Archived
Python
LLM Frameworks
Computer Vision
PyTorch
#chatgpt#gpt-4#computer-vision

rhymes-ai/Aria

Aria is an open-source multimodal AI framework for building vision and language models.

1.1K
Archived
Jupyter Notebook
Agents & Orchestration
Computer Vision
Jupyter Notebook
#multimodal#vision-and-language#mixture-of-experts

HITsz-TMG/Uni-MoE

Uni-MoE is a large multimodal model family from Lychee, a Python library for AI model development and deployment.

1.1K
Stable
Python
LLM Frameworks
LLM Wrappers & SDKs
Python
#multimodal-models#large-language-models#ai-development

maelfabien/Multimodal-Emotion-Recognition

A real-time multimodal emotion recognition web app for text, sound, and video inputs.

1.1K
Archived
Jupyter Notebook
Emotion Recognition
Frontend Frameworks
Keras
#emotion-analysis#emotion-detection#real-time

OpenBMB/VisCPM

A multimodal large language model series for Chinese and English AI-powered coding and painting tools.

1.1K
Archived
Python
LLM Frameworks
Multimodal
PyTorch
#diffusion-models#large-language-models#transformers

AILab-CVC/UniRepLKNet

A CVPR 2024 and TPAMI 2025 AI-powered multimodal learning architecture for vibe coders.

1.1K
Stable
Python
Computer Vision
Multimodal Learning
PyTorch
#architecture#artificial-intelligence#convolutional-neural-networks

OFA-Sys/ONE-PEACE

A general representation model for cross-modal learning across vision, audio, and language.

1.1K
Archived
Python
LLM Frameworks
Representation Learning
Python
#multimodal#contrastive-learning#foundation-models

WangRongsheng/XrayGLM

An open-source Chinese medical multimodal model that can summarize chest radiographs.

1.1K
Archived
Python
LLM Frameworks
API Frameworks
Python
#medical#multimodal#llm

BAAI-DCAI/Bunny

A family of lightweight multimodal models for chatGPT, GPT-4, and other large language models.

1.1K
Archived
Python
LLM Frameworks
LLM Wrappers & SDKs
Python
#chatgpt#gpt-4#multimodal

ashkamath/mdetr

An image-text multimodal deep learning model for object detection and recognition.

1.0K
Archived
Python
Computer Vision
LLM Frameworks
PyTorch
#computer-vision#object-detection#multimodal-learning

MinorJerry/WebVoyager

An end-to-end web agent built with large multimodal models, enabling AI-powered web browsing and automation.

1.0K
Archived
Python
Agents & Orchestration
Full-Stack Frameworks
Python
#ai-agent#multimodal-models#web-automation

ArrowLuo/CLIP4Clip

An official implementation of CLIP4Clip, a model for end-to-end video clip retrieval.

1.0K
Archived
Python
Computer Vision
Multimodal Learning
Python
#multimodal#video-retrieval#clip

PreferredAI/cornac

A comparative framework for building multimodal recommender systems using collaborative filtering and matrix factorization.

1.0K
Stable
Python
Recommender System
Machine Learning Frameworks
Python
#collaborative-filtering#matrix-factorization#multimodal-learning

declare-lab/MELD

A multimodal dataset for emotion recognition in conversation, useful for building conversational AI and chatbots.

1.0K
Archived
Python
Computer Vision
Agents & Orchestration
Python
#emotion-recognition#multimodal-interactions#dialogue-systems
1...8

Stay in the loop

Get weekly updates on trending AI coding tools and projects.