Explore Projects

Discover 12 open source projects

Active filters (1):

Search: multimodal-large-language-models×

Clear all

Showing 1-12 of 12 projects

BradyFU/Awesome-Multimodal-Large-Language-Models

A collection of multimodal large language models and their latest advances.

17.4K

Active

React

#large-language-models#multimodal-chain-of-thought#in-context-learning

X-PLUG/MobileAgent

A powerful GUI agent family that enables mobile automation, multimodal interaction, and integration with AI tools.

8.0K

Stable

Python

AI Coding Agents

Agents & Orchestration

Python

#agent#automation#multimodal

ictnlp/LLaMA-Omni

A high-quality end-to-end speech interaction model for AI-powered voice applications.

3.1K

Experimental

Python

LLM Frameworks

AI Voice & Speech

Python

#large-language-model#speech-interaction#speech-to-speech

VITA-MLLM/VITA

A powerful multimodal AI model for real-time vision and speech interaction, built for developers who work with AI tools.

2.5K

Experimental

Python

LLM Frameworks

Agents & Orchestration

Python

#large-language-model#multimodal#video-understanding

X-PLUG/mPLUG-DocOwl

A modular multimodal large language model for advanced document understanding and analysis.

2.4K

Experimental

Python

LLM Frameworks

RAG Frameworks

Python

#document-understanding#table-understanding#chart-understanding

cambrian-mllm/cambrian

Cambrian-1 is a multimodal LLM with a vision-centric design for building AI-powered chatbots and applications.

2.0K

Stable

Python

LLM Frameworks

Computer Vision

Python

#chatbot#computer-vision#large-language-models

YangLing0818/RPG-DiffusionMaster

A powerful text-to-image diffusion model that can be used for recaptioning, planning, and generating with multimodal LLMs.

1.8K

Experimental

Jupyter Notebook

LLM Frameworks

Computer Vision

Jupyter Notebook

#text-to-image#image-editing#diffusion

ByteDance-Seed/Seed1.5-VL

A powerful vision-language foundation model designed to advance multimodal AI understanding and reasoning.

1.6K

Experimental

Jupyter Notebook

LLM Frameworks

Computer Vision

Jupyter Notebook

#multimodal-ai#vision-language-model#large-language-model

AIDC-AI/Ovis

A novel Multimodal Large Language Model (MLLM) architecture for structurally aligning visual and textual embeddings.

1.4K

Stable

Python

LLM Frameworks

Vision-Language Model

Python

#chatbot#llama3#multimodal

Henry-23/VideoChat

Real-time voice interactive digital human with customizable appearance and voice, supporting voice cloning.

1.2K

Stable

Python

React

#authentication#streaming#real-time

AIDC-AI/Awesome-Unified-Multimodal-Models

A curated collection of awesome unified multimodal models for text-to-image generation and vision-language tasks.

1.1K

Active

LLM Frameworks

Computer Vision

#multimodal-models#text-to-image-generation#vision-language-model

BAAI-DCAI/Bunny

A family of lightweight multimodal models for chatGPT, GPT-4, and other large language models.

1.1K

Archived

Python

LLM Frameworks

LLM Wrappers & SDKs

Python

#chatgpt#gpt-4#multimodal

Stay in the loop

Get weekly updates on trending AI coding tools and projects.