Explore Projects

Discover 12 open source projects

Active filters (1):
Search: multimodal-large-language-modelsร—
Clear all

Showing 1-12 of 12 projects

BradyFU/Awesome-Multimodal-Large-Language-Models

A collection of multimodal large language models and their latest advances.

17.4K
Active
React
#large-language-models#multimodal-chain-of-thought#in-context-learning

X-PLUG/MobileAgent

A powerful GUI agent family that enables mobile automation, multimodal interaction, and integration with AI tools.

8.0K
Stable
Python
AI Coding Agents
Agents & Orchestration
Python
#agent#automation#multimodal

ictnlp/LLaMA-Omni

A high-quality end-to-end speech interaction model for AI-powered voice applications.

3.1K
Experimental
Python
LLM Frameworks
AI Voice & Speech
Python
#large-language-model#speech-interaction#speech-to-speech

VITA-MLLM/VITA

A powerful multimodal AI model for real-time vision and speech interaction, built for developers who work with AI tools.

2.5K
Experimental
Python
LLM Frameworks
Agents & Orchestration
Python
#large-language-model#multimodal#video-understanding

X-PLUG/mPLUG-DocOwl

A modular multimodal large language model for advanced document understanding and analysis.

2.4K
Experimental
Python
LLM Frameworks
RAG Frameworks
Python
#document-understanding#table-understanding#chart-understanding

cambrian-mllm/cambrian

Cambrian-1 is a multimodal LLM with a vision-centric design for building AI-powered chatbots and applications.

2.0K
Stable
Python
LLM Frameworks
Computer Vision
Python
#chatbot#computer-vision#large-language-models

YangLing0818/RPG-DiffusionMaster

A powerful text-to-image diffusion model that can be used for recaptioning, planning, and generating with multimodal LLMs.

1.8K
Experimental
Jupyter Notebook
LLM Frameworks
Computer Vision
Jupyter Notebook
#text-to-image#image-editing#diffusion

ByteDance-Seed/Seed1.5-VL

A powerful vision-language foundation model designed to advance multimodal AI understanding and reasoning.

1.6K
Experimental
Jupyter Notebook
LLM Frameworks
Computer Vision
Jupyter Notebook
#multimodal-ai#vision-language-model#large-language-model

AIDC-AI/Ovis

A novel Multimodal Large Language Model (MLLM) architecture for structurally aligning visual and textual embeddings.

1.4K
Stable
Python
LLM Frameworks
Vision-Language Model
Python
#chatbot#llama3#multimodal

Henry-23/VideoChat

Real-time voice interactive digital human with customizable appearance and voice, supporting voice cloning.

1.2K
Stable
Python
React
#authentication#streaming#real-time

AIDC-AI/Awesome-Unified-Multimodal-Models

A curated collection of awesome unified multimodal models for text-to-image generation and vision-language tasks.

1.1K
Active
LLM Frameworks
Computer Vision
#multimodal-models#text-to-image-generation#vision-language-model

BAAI-DCAI/Bunny

A family of lightweight multimodal models for chatGPT, GPT-4, and other large language models.

1.1K
Archived
Python
LLM Frameworks
LLM Wrappers & SDKs
Python
#chatgpt#gpt-4#multimodal

Stay in the loop

Get weekly updates on trending AI coding tools and projects.