Explore Projects

Discover 17 open source projects

Active filters (1):
Search: mllmร—
Clear all

Showing 1-17 of 17 projects

datawhalechina/self-llm

Comprehensive guide for Chinese developers to deploy and fine-tune open-source LLMs on Linux

28.7K
Active
Jupyter Notebook
Fine-tuning
Tutorials & Courses
Linux
#llm#fine-tuning#linux

OpenBMB/MiniCPM-o

On-device multimodal LLM for vision, speech, and live streaming on phones

24.0K
Active
Python
Inference
Local Inference Engines
llama.cpp-omni
#minicpm-o#multimodal-llm#on-device-ai

microsoft/unilm

Microsoft's research repo for large-scale self-supervised pre-training across tasks, languages, and modalities

22.0K
Active
Python
LLM Frameworks
#foundation-models#multimodal-ai#nlp

simular-ai/Agent-S

An open agentic framework that enables computers to act like humans

10.0K
Active
Python
React
#agent-computer-interface#in-context-reinforcement-learning#grounding

X-PLUG/MobileAgent

A powerful GUI agent family that enables mobile automation, multimodal interaction, and integration with AI tools.

8.0K
Stable
Python
AI Coding Agents
Agents & Orchestration
Python
#agent#automation#multimodal

ant-research/MagicQuill

MagicQuill is an intelligent interactive image editing system powered by AI for CVPR'25.

3.7K
Stable
Python
Computer Vision
Component Libraries (React)
React
#aigc#gradio#image-editing

NExT-GPT/NExT-GPT

Code and models for a multimodal large language model that can perform any-to-any tasks

3.6K
Experimental
Python
LLM Frameworks
Agents & Orchestration
PyTorch
#chatgpt#foundation-models#gpt-4

InternLM/InternLM-XComposer

A comprehensive multimodal system for long-term streaming video and audio interactions using large language models.

2.9K
Experimental
Python
LLM Frameworks
Computer Vision
PyTorch
#chatgpt#gpt-4#multimodal

VITA-MLLM/VITA

A powerful multimodal AI model for real-time vision and speech interaction, built for developers who work with AI tools.

2.5K
Experimental
Python
LLM Frameworks
Agents & Orchestration
Python
#large-language-model#multimodal#video-understanding

X-PLUG/mPLUG-DocOwl

A modular multimodal large language model for advanced document understanding and analysis.

2.4K
Experimental
Python
LLM Frameworks
RAG Frameworks
Python
#document-understanding#table-understanding#chart-understanding

cambrian-mllm/cambrian

Cambrian-1 is a multimodal LLM with a vision-centric design for building AI-powered chatbots and applications.

2.0K
Stable
Python
LLM Frameworks
Computer Vision
Python
#chatbot#computer-vision#large-language-models

coderonion/awesome-yolo-object-detection

A curated collection of YOLO object detection projects and datasets for developers working with computer vision and AI.

1.7K
Experimental
Computer Vision
Datasets
#object-detection#yolo#datasets

AIDC-AI/Ovis

A novel Multimodal Large Language Model (MLLM) architecture for structurally aligning visual and textual embeddings.

1.4K
Stable
Python
LLM Frameworks
Vision-Language Model
Python
#chatbot#llama3#multimodal

UbiquitousLearning/mllm

Fast Multimodal LLM on Mobile Devices: A C++ library for running large language models on mobile devices

1.4K
Active
C++
LLM Frameworks
Cross-Platform
#llm#mobile#multimodal

HJYao00/Mulberry

Mulberry is an o1-like Reasoning and Reflection MLLM implemented via Collective MCTS for AI-powered coding tools.

1.2K
Active
Python
LLM Frameworks
AI Code Generation
Python
#ai-coding#llm#mcts

IDEA-Research/Rex-Omni

A next-generation object detection model that can detect anything with high accuracy and efficiency.

1.2K
Active
Jupyter Notebook
Computer Vision
LLM Frameworks
Jupyter Notebook
#object-detection#open-set#mllm

BAAI-DCAI/Bunny

A family of lightweight multimodal models for chatGPT, GPT-4, and other large language models.

1.1K
Archived
Python
LLM Frameworks
LLM Wrappers & SDKs
Python
#chatgpt#gpt-4#multimodal

Stay in the loop

Get weekly updates on trending AI coding tools and projects.