Explore Projects

Discover 17 open source projects

Active filters (1):

Search: mllm×

Clear all

Showing 1-17 of 17 projects

datawhalechina/self-llm

Comprehensive guide for Chinese developers to deploy and fine-tune open-source LLMs on Linux

28.7K

Active

Jupyter Notebook

Fine-tuning

Tutorials & Courses

Linux

#llm#fine-tuning#linux

OpenBMB/MiniCPM-o

On-device multimodal LLM for vision, speech, and live streaming on phones

24.0K

Active

Python

Inference

Local Inference Engines

llama.cpp-omni

#minicpm-o#multimodal-llm#on-device-ai

microsoft/unilm

Microsoft's research repo for large-scale self-supervised pre-training across tasks, languages, and modalities

22.0K

Active

Python

LLM Frameworks

#foundation-models#multimodal-ai#nlp

simular-ai/Agent-S

An open agentic framework that enables computers to act like humans

10.0K

Active

Python

React

#agent-computer-interface#in-context-reinforcement-learning#grounding

X-PLUG/MobileAgent

A powerful GUI agent family that enables mobile automation, multimodal interaction, and integration with AI tools.

8.0K

Stable

Python

AI Coding Agents

Agents & Orchestration

Python

#agent#automation#multimodal

ant-research/MagicQuill

MagicQuill is an intelligent interactive image editing system powered by AI for CVPR'25.

3.7K

Stable

Python

Computer Vision

Component Libraries (React)

React

#aigc#gradio#image-editing

NExT-GPT/NExT-GPT

Code and models for a multimodal large language model that can perform any-to-any tasks

3.6K

Experimental

Python

LLM Frameworks

Agents & Orchestration

PyTorch

#chatgpt#foundation-models#gpt-4

InternLM/InternLM-XComposer

A comprehensive multimodal system for long-term streaming video and audio interactions using large language models.

2.9K

Experimental

Python

LLM Frameworks

Computer Vision

PyTorch

#chatgpt#gpt-4#multimodal

VITA-MLLM/VITA

A powerful multimodal AI model for real-time vision and speech interaction, built for developers who work with AI tools.

2.5K

Experimental

Python

LLM Frameworks

Agents & Orchestration

Python

#large-language-model#multimodal#video-understanding

X-PLUG/mPLUG-DocOwl

A modular multimodal large language model for advanced document understanding and analysis.

2.4K

Experimental

Python

LLM Frameworks

RAG Frameworks

Python

#document-understanding#table-understanding#chart-understanding

cambrian-mllm/cambrian

Cambrian-1 is a multimodal LLM with a vision-centric design for building AI-powered chatbots and applications.

2.0K

Stable

Python

LLM Frameworks

Computer Vision

Python

#chatbot#computer-vision#large-language-models

coderonion/awesome-yolo-object-detection

A curated collection of YOLO object detection projects and datasets for developers working with computer vision and AI.

1.7K

Experimental

Computer Vision

Datasets

#object-detection#yolo#datasets

AIDC-AI/Ovis

A novel Multimodal Large Language Model (MLLM) architecture for structurally aligning visual and textual embeddings.

1.4K

Stable

Python

LLM Frameworks

Vision-Language Model

Python

#chatbot#llama3#multimodal

UbiquitousLearning/mllm

Fast Multimodal LLM on Mobile Devices: A C++ library for running large language models on mobile devices

1.4K

Active

C++

LLM Frameworks

Cross-Platform

#llm#mobile#multimodal

HJYao00/Mulberry

Mulberry is an o1-like Reasoning and Reflection MLLM implemented via Collective MCTS for AI-powered coding tools.

1.2K

Active

Python

LLM Frameworks

AI Code Generation

Python

#ai-coding#llm#mcts

IDEA-Research/Rex-Omni

A next-generation object detection model that can detect anything with high accuracy and efficiency.

1.2K

Active

Jupyter Notebook

Computer Vision

LLM Frameworks

Jupyter Notebook

#object-detection#open-set#mllm

BAAI-DCAI/Bunny

A family of lightweight multimodal models for chatGPT, GPT-4, and other large language models.

1.1K

Archived

Python

LLM Frameworks

LLM Wrappers & SDKs

Python

#chatgpt#gpt-4#multimodal

Stay in the loop

Get weekly updates on trending AI coding tools and projects.