Showing 161-178 of 178 projects
A frontier multimodal foundation model for advanced image and video understanding tasks.
A project exploring human-machine collaboration using a robotic arm, large language models, and multimodal AI.
A large multimodal multilingual dataset of image-text pairs from Wikipedia for machine learning research.
This C-based multimode monitor project supports various digital radio protocols.
An official implementation of a system for improving video understanding and generation with better captions.
Aria is an open-source multimodal AI framework for building vision and language models.
Uni-MoE is a large multimodal model family from Lychee, a Python library for AI model development and deployment.
A real-time multimodal emotion recognition web app for text, sound, and video inputs.
A multimodal large language model series for Chinese and English AI-powered coding and painting tools.
A CVPR 2024 and TPAMI 2025 AI-powered multimodal learning architecture for vibe coders.
A general representation model for cross-modal learning across vision, audio, and language.
An open-source Chinese medical multimodal model that can summarize chest radiographs.
A family of lightweight multimodal models for chatGPT, GPT-4, and other large language models.
An image-text multimodal deep learning model for object detection and recognition.
An end-to-end web agent built with large multimodal models, enabling AI-powered web browsing and automation.
An official implementation of CLIP4Clip, a model for end-to-end video clip retrieval.
A comparative framework for building multimodal recommender systems using collaborative filtering and matrix factorization.
A multimodal dataset for emotion recognition in conversation, useful for building conversational AI and chatbots.
Get weekly updates on trending AI coding tools and projects.