Explore Projects

Discover 178 open source projects

Active filters (1):

Search: multimodal×

Showing 81-100 of 178 projects

jonyzhang2023/awesome-embodied-vla-va-vln

A curated list of state-of-the-art research in embodied AI, focusing on VLA, VLN, and related multimodal learning approaches.

2.6K

Active

Computer Vision

Agents & Orchestration

#embodied-ai#vision-language-action#vision-language-navigation

autodistill/autodistill

An AI-powered tool for training supervised models without manual labeling, using foundation models and multimodal learning.

2.6K

Experimental

Python

Computer Vision

Model Distillation

PyTorch

#auto-labeling#computer-vision#foundation-models

xlang-ai/OSWorld

A benchmark for multimodal AI agents to tackle open-ended tasks in real computer environments.

2.6K

Active

Python

Agents & Orchestration

Benchmark

Python

#multimodal-ai#agent-benchmarking#open-ended-tasks

FreedomIntelligence/Awesome-AI4Med

An awesome curated list of medical-related AI/ML resources including LLMs, datasets, and benchmarks.

2.6K

Active

LLM Frameworks

Datasets

#medical#llms#datasets

OFA-Sys/OFA

Official repository for the OFA (Unifying Architectures, Tasks, and Modalities) AI model, supporting various vision-language tasks.

2.6K

Archived

Python

LLM Frameworks

Computer Vision

PyTorch

#pretrained-models#multimodal#vision-language

KimMeen/Time-LLM

An official implementation of a time series forecasting model using large language models.

2.5K

Stable

Python

LLM Frameworks

Time Series

Python

#time-series-forecasting#large-language-models#deep-learning

X-PLUG/mPLUG-Owl

A powerful multi-modal large language model family for building advanced AI chatbots and visual recognition models.

2.5K

Experimental

Python

LLM Frameworks

Computer Vision

PyTorch

#chatbot#gpt#multimodal

VITA-MLLM/VITA

A powerful multimodal AI model for real-time vision and speech interaction, built for developers who work with AI tools.

2.5K

Experimental

Python

LLM Frameworks

Agents & Orchestration

Python

#large-language-model#multimodal#video-understanding

chatdoc-com/OCRFlux

OCRFlux is a powerful PDF-to-Markdown conversion toolkit with advanced layout handling, table parsing, and cross-page content merging.

2.5K

Experimental

Python

Computer Vision

API Frameworks

Python

#pdf-conversion#markdown-generation#layout-handling

InternLM/HuixiangDou

A Python-based assistant tool for improving group chat experiences using large language models.

2.5K

Stable

Python

LLM Frameworks

Agents & Orchestration

#chatbot#group-chat#multimodal

BAAI-Agents/Cradle

A framework for building AI agents with strong reasoning abilities, self-improvement, and skill curation in a general computing environment.

2.5K

Archived

Python

Agents & Orchestration

LLM Frameworks

Python

#ai-agents#llm#general-computer-control

Stability-AI/stability-sdk

SDK for interacting with the Stability.AI APIs, including Stable Diffusion inference.

2.4K

Stable

Jupyter Notebook

AI SDKs & Wrappers

API Clients & Testing

Jupyter Notebook

#ai-art#generative-art#latent-diffusion

Yutong-Zhou-cv/Awesome-Text-to-Image

A curated list of resources on text-to-image generation and synthesis, useful for AI-focused developers.

2.4K

Active

Computer Vision

Tutorials & Courses

React

#text-to-image#computer-vision#generative-adversarial-networks

OmniSVG/OmniSVG

An end-to-end multimodal SVG generator that leverages pre-trained Vision-Language Models to create complex and detailed SVGs.

2.4K

Active

Python

LLM Frameworks

Animation & Motion

Python

#svg-generation#vision-language-models#multimodal-ai

X-PLUG/mPLUG-DocOwl

A modular multimodal large language model for advanced document understanding and analysis.

2.4K

Experimental

Python

LLM Frameworks

RAG Frameworks

Python

#document-understanding#table-understanding#chart-understanding

JIA-Lab-research/DreamOmni2

This is an open-source implementation of a multimodal instruction-based editing and generation model.

2.3K

Stable

Python

LLM Frameworks

Computer Vision

Python

#image-editing#image-generation#unified-model

OpenGVLab/InternVideo

A video foundation model and dataset for multimodal understanding and video understanding tasks.

2.2K

Stable

Python

Computer Vision

Datasets

PyTorch

#video-understanding#multimodal#foundation-models

zai-org/GLM-V

A scalable multimodal reasoning framework for AI-powered applications with a focus on video and image understanding.

2.2K

Active

Python

LLM Frameworks

Agents & Orchestration

Python

#multimodal-reasoning#video-understanding#image-to-text

KaiyangZhou/CoOp

A prompt learning framework for vision-language models.

2.2K

Archived

Python

React

#prompt-engineering#multimodal-learning#foundation-models

elevenlabs/ui

A component library and registry built on shadcn/ui to help you build multimodal AI agents faster.

2.1K

Active

TypeScript

AI Agents

Component Libraries (React)

React

#ai#agents#components

1...46...9

Stay in the loop

Get weekly updates on trending AI coding tools and projects.