Showing 121-140 of 178 projects
An open-source diffusion-based multimodal LLM framework for unified understanding and generation.
A powerful vision-language foundation model designed to advance multimodal AI understanding and reasoning.
A Vision-and-Language Transformer model for multimodal tasks without the need for convolution or region supervision.
A Python library that helps developers extract structured data from tricky documents using vision-language models.
Multimodal-GPT is a powerful library for building AI-powered applications that leverage multimodal data like text, images, and more.
Generates realistic images from text using a Pix2Pix GAN.
An open-source framework for AI-powered process automation with support for large language, action, and multimodal models.
A fork to add multimodal model training capabilities to the open-r1 project, a framework for building AI tools.
A Python library that provides native multimodal models for building world-learning AI systems.
A Python library for creating video-based multimodal explanations for LLM theorem understanding.
An implementation for detailed localized image and video captioning using large multimodal models.
A novel Multimodal Large Language Model (MLLM) architecture for structurally aligning visual and textual embeddings.
Fast Multimodal LLM on Mobile Devices: A C++ library for running large language models on mobile devices
Realtime AI voice agents with state-of-the-art multimodal AI models for AI toys, companions, and devices.
A curation of the latest CVPR (Computer Vision and Pattern Recognition) papers, code, and demos for AI-powered developers.
A flexible package for multimodal deep learning to combine tabular, text, and image data using Wide and Deep models in PyTorch.
A curated list of research on multimodal learning, useful for developers working on AI-powered applications.
This repository provides valuable resources for researchers working on RL-based Reasoning MLLMs.
A curated list of resources for leveraging visual information in large vision-language models (LVLMs) for complex reasoning, planning, and generation.
Comprehensive overview of Japanese Large Language Models (LLMs) for developers interested in generative AI.
Get weekly updates on trending AI coding tools and projects.