Explore Projects

Discover 178 open source projects

Active filters (1):
Search: multimodalityร—
Clear all

Showing 121-140 of 178 projects

Gen-Verse/MMaDA

An open-source diffusion-based multimodal LLM framework for unified understanding and generation.

1.6K
Stable
Python
LLM Frameworks
Agents & Orchestration
Python
#diffusion-models#llm-reasoning#unified-multimodal-understanding-and-generation

ByteDance-Seed/Seed1.5-VL

A powerful vision-language foundation model designed to advance multimodal AI understanding and reasoning.

1.6K
Experimental
Jupyter Notebook
LLM Frameworks
Computer Vision
Jupyter Notebook
#multimodal-ai#vision-language-model#large-language-model

dandelin/ViLT

A Vision-and-Language Transformer model for multimodal tasks without the need for convolution or region supervision.

1.5K
Archived
Python
Computer Vision
LLM Frameworks
Python
#vision-language#multimodal#transformer

emcf/thepipe

A Python library that helps developers extract structured data from tricky documents using vision-language models.

1.5K
Stable
Python
LLM Frameworks
ETL & Pipelines
Python
#document-processing#large-language-models#multimodal

open-mmlab/Multimodal-GPT

Multimodal-GPT is a powerful library for building AI-powered applications that leverage multimodal data like text, images, and more.

1.5K
Archived
Python
LLM Frameworks
Computer Vision
PyTorch
#multimodal#gpt#llama

junyanz/BicycleGAN

Generates realistic images from text using a Pix2Pix GAN.

1.5K
Archived
Python
PyTorch
#image-to-image-transformation#generative-adversarial-networks#deep-learning

OpenAdaptAI/OpenAdapt

An open-source framework for AI-powered process automation with support for large language, action, and multimodal models.

1.5K
Active
Python
LLM Frameworks
Agents & Orchestration
Python
#ai-agents#process-automation#large-language-models

EvolvingLMMs-Lab/open-r1-multimodal

A fork to add multimodal model training capabilities to the open-r1 project, a framework for building AI tools.

1.5K
Experimental
Python
LLM Frameworks
CLI Tools
Python
#multimodal-training#open-source#llm

TIGER-AI-Lab/TheoremExplainAgent

A Python library for creating video-based multimodal explanations for LLM theorem understanding.

1.5K
Experimental
Python
LLM Frameworks
Agents & Orchestration
#llm#explanation#video

baaivision/Emu3.5

A Python library that provides native multimodal models for building world-learning AI systems.

1.5K
Stable
Python
LLM Frameworks
Agents & Orchestration
Python
#multimodal#world-learning#agents

NVlabs/describe-anything

An implementation for detailed localized image and video captioning using large multimodal models.

1.5K
Experimental
Python
Computer Vision
LLM Frameworks
Python
#describe-anything#detailed-localized-captioning#large-multimodal-models

AIDC-AI/Ovis

A novel Multimodal Large Language Model (MLLM) architecture for structurally aligning visual and textual embeddings.

1.4K
Stable
Python
LLM Frameworks
Vision-Language Model
Python
#chatbot#llama3#multimodal

UbiquitousLearning/mllm

Fast Multimodal LLM on Mobile Devices: A C++ library for running large language models on mobile devices

1.4K
Active
C++
LLM Frameworks
Cross-Platform
#llm#mobile#multimodal

akdeb/ElatoAI

Realtime AI voice agents with state-of-the-art multimodal AI models for AI toys, companions, and devices.

1.4K
Active
TypeScript
AI Voice & Speech
Arduino & Embedded
TypeScript
#ai#voice#realtime

DWCTOD/CVPR2024-Papers-with-Code-Demo

A curation of the latest CVPR (Computer Vision and Pattern Recognition) papers, code, and demos for AI-powered developers.

1.4K
Archived
Computer Vision
Tutorials & Courses
#computer-vision#cvpr#tutorials

jrzaurin/pytorch-widedeep

A flexible package for multimodal deep learning to combine tabular, text, and image data using Wide and Deep models in PyTorch.

1.4K
Stable
Python
LLM Frameworks
API Frameworks
PyTorch
#deep-learning#multimodal#tabular-data

Eurus-Holmes/Awesome-Multimodal-Research

A curated list of research on multimodal learning, useful for developers working on AI-powered applications.

1.4K
Archived
Python
Multimodal Learning
#multimodal#research#awesome-list

Sun-Haoyuan23/Awesome-RL-based-Reasoning-MLLMs

This repository provides valuable resources for researchers working on RL-based Reasoning MLLMs.

1.4K
Stable
LLM Frameworks
Tutorials & Courses
#ml#llm#reinforcement-learning

zhaochen0110/Awesome_Think_With_Images

A curated list of resources for leveraging visual information in large vision-language models (LVLMs) for complex reasoning, planning, and generation.

1.3K
Stable
LLM Frameworks
Multimodal Reasoning & Visual Reasoning
#large-vision-language-models#multimodal-reasoning#visual-reasoning

llm-jp/awesome-japanese-llm

Comprehensive overview of Japanese Large Language Models (LLMs) for developers interested in generative AI.

1.3K
Active
TypeScript
LLM Frameworks
Generative AI
#japanese#llm#foundation-models
1...689

Stay in the loop

Get weekly updates on trending AI coding tools and projects.