Explore Projects

Discover 178 open source projects

Active filters (1):
Search: multimodalร—
Clear all

Showing 81-100 of 178 projects

jonyzhang2023/awesome-embodied-vla-va-vln

A curated list of state-of-the-art research in embodied AI, focusing on VLA, VLN, and related multimodal learning approaches.

2.6K
Active
Computer Vision
Agents & Orchestration
#embodied-ai#vision-language-action#vision-language-navigation

autodistill/autodistill

An AI-powered tool for training supervised models without manual labeling, using foundation models and multimodal learning.

2.6K
Experimental
Python
Computer Vision
Model Distillation
PyTorch
#auto-labeling#computer-vision#foundation-models

xlang-ai/OSWorld

A benchmark for multimodal AI agents to tackle open-ended tasks in real computer environments.

2.6K
Active
Python
Agents & Orchestration
Benchmark
Python
#multimodal-ai#agent-benchmarking#open-ended-tasks

FreedomIntelligence/Awesome-AI4Med

An awesome curated list of medical-related AI/ML resources including LLMs, datasets, and benchmarks.

2.6K
Active
LLM Frameworks
Datasets
#medical#llms#datasets

OFA-Sys/OFA

Official repository for the OFA (Unifying Architectures, Tasks, and Modalities) AI model, supporting various vision-language tasks.

2.6K
Archived
Python
LLM Frameworks
Computer Vision
PyTorch
#pretrained-models#multimodal#vision-language

KimMeen/Time-LLM

An official implementation of a time series forecasting model using large language models.

2.5K
Stable
Python
LLM Frameworks
Time Series
Python
#time-series-forecasting#large-language-models#deep-learning

X-PLUG/mPLUG-Owl

A powerful multi-modal large language model family for building advanced AI chatbots and visual recognition models.

2.5K
Experimental
Python
LLM Frameworks
Computer Vision
PyTorch
#chatbot#gpt#multimodal

VITA-MLLM/VITA

A powerful multimodal AI model for real-time vision and speech interaction, built for developers who work with AI tools.

2.5K
Experimental
Python
LLM Frameworks
Agents & Orchestration
Python
#large-language-model#multimodal#video-understanding

chatdoc-com/OCRFlux

OCRFlux is a powerful PDF-to-Markdown conversion toolkit with advanced layout handling, table parsing, and cross-page content merging.

2.5K
Experimental
Python
Computer Vision
API Frameworks
Python
#pdf-conversion#markdown-generation#layout-handling

InternLM/HuixiangDou

A Python-based assistant tool for improving group chat experiences using large language models.

2.5K
Stable
Python
LLM Frameworks
Agents & Orchestration
#chatbot#group-chat#multimodal

BAAI-Agents/Cradle

A framework for building AI agents with strong reasoning abilities, self-improvement, and skill curation in a general computing environment.

2.5K
Archived
Python
Agents & Orchestration
LLM Frameworks
Python
#ai-agents#llm#general-computer-control

Stability-AI/stability-sdk

SDK for interacting with the Stability.AI APIs, including Stable Diffusion inference.

2.4K
Stable
Jupyter Notebook
AI SDKs & Wrappers
API Clients & Testing
Jupyter Notebook
#ai-art#generative-art#latent-diffusion

Yutong-Zhou-cv/Awesome-Text-to-Image

A curated list of resources on text-to-image generation and synthesis, useful for AI-focused developers.

2.4K
Active
Computer Vision
Tutorials & Courses
React
#text-to-image#computer-vision#generative-adversarial-networks

OmniSVG/OmniSVG

An end-to-end multimodal SVG generator that leverages pre-trained Vision-Language Models to create complex and detailed SVGs.

2.4K
Active
Python
LLM Frameworks
Animation & Motion
Python
#svg-generation#vision-language-models#multimodal-ai

X-PLUG/mPLUG-DocOwl

A modular multimodal large language model for advanced document understanding and analysis.

2.4K
Experimental
Python
LLM Frameworks
RAG Frameworks
Python
#document-understanding#table-understanding#chart-understanding

JIA-Lab-research/DreamOmni2

This is an open-source implementation of a multimodal instruction-based editing and generation model.

2.3K
Stable
Python
LLM Frameworks
Computer Vision
Python
#image-editing#image-generation#unified-model

OpenGVLab/InternVideo

A video foundation model and dataset for multimodal understanding and video understanding tasks.

2.2K
Stable
Python
Computer Vision
Datasets
PyTorch
#video-understanding#multimodal#foundation-models

zai-org/GLM-V

A scalable multimodal reasoning framework for AI-powered applications with a focus on video and image understanding.

2.2K
Active
Python
LLM Frameworks
Agents & Orchestration
Python
#multimodal-reasoning#video-understanding#image-to-text

KaiyangZhou/CoOp

A prompt learning framework for vision-language models.

2.2K
Archived
Python
React
#prompt-engineering#multimodal-learning#foundation-models

elevenlabs/ui

A component library and registry built on shadcn/ui to help you build multimodal AI agents faster.

2.1K
Active
TypeScript
AI Agents
Component Libraries (React)
React
#ai#agents#components
1...46...9

Stay in the loop

Get weekly updates on trending AI coding tools and projects.