Explore Projects

Discover 178 open source projects

Active filters (1):

Search: multimodal×

Clear all

Showing 61-80 of 178 projects

EvolvingLMMs-Lab/lmms-eval

A multimodal evaluation toolkit for assessing AI models across text, image, video, and audio tasks.

3.8K

Active

Python

LLM Frameworks

Agents & Orchestration

Python

#evaluation#multimodal#large-language-models

NVlabs/VILA

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

3.8K

Stable

Python

LLM Frameworks

Computer Vision

Python

#vision-language-model#multimodal-ai#edge-computing

microsoft/PhiCookBook

An open-source cookbook for getting started with Phi, a family of high-performance small language models from Microsoft.

3.7K

Active

Jupyter Notebook

LLM Frameworks

Books & Guides

#language-model#phi-models#small-language-model

NExT-GPT/NExT-GPT

Code and models for a multimodal large language model that can perform any-to-any tasks

3.6K

Experimental

Python

LLM Frameworks

Agents & Orchestration

PyTorch

#chatgpt#foundation-models#gpt-4

morphik-org/morphik-core

A comprehensive document search and storage platform for building AI applications using Python.

3.5K

Active

Python

LLM Frameworks

API Frameworks

Python

#artificial-intelligence#document-search#document-storage

OpenGVLab/InternGPT

InternGPT is an open-source demo platform that showcases various AI models, including DragGAN, ChatGPT, ImageBind, and multimodal chat.

3.2K

Archived

Python

LLM Frameworks

Agents & Orchestration

React

#chatgpt#draggan#imagebind

SkyworkAI/Skywork-R1V

An advanced multimodal AI model series for vision-language reasoning, developed by Skywork AI.

3.2K

Stable

Python

LLM Frameworks

Agents & Orchestration

Python

#multimodal#vision-language#reasoning

embeddings-benchmark/mteb

MTEB is a benchmark for evaluating and comparing text embedding models across multiple tasks and languages.

3.2K

Active

Python

LLM Wrappers & SDKs

Python

#benchmark#text-embedding#multilingual-nlp

microsoft/torchscale

Foundation Architecture for (M)LLMs, a powerful toolkit for building large language models.

3.1K

Archived

Python

LLM Frameworks

API Frameworks

Python

#large-language-models#transformer#multimodal

ictnlp/LLaMA-Omni

A high-quality end-to-end speech interaction model for AI-powered voice applications.

3.1K

Experimental

Python

LLM Frameworks

AI Voice & Speech

Python

#large-language-model#speech-interaction#speech-to-speech

docarray/docarray

A Python library for representing, sending, storing, and searching multimodal data in AI and ML applications.

3.1K

Active

Python

LLM Frameworks

Vector Databases

PyTorch

#cross-modal#multimodal#neural-search

vllm-project/vllm-omni

A Python framework for efficient model inference with omni-modality AI models.

2.9K

Active

Python

Inference

Multimodal

PyTorch

#audio-generation#diffusion#image-generation

InternLM/InternLM-XComposer

A comprehensive multimodal system for long-term streaming video and audio interactions using large language models.

2.9K

Experimental

Python

LLM Frameworks

Computer Vision

PyTorch

#chatgpt#gpt-4#multimodal

Tencent-Hunyuan/HunyuanImage-3.0

Native multimodal model for high-quality image generation with text-to-image capabilities

2.9K

Active

Python

AI Image & Video

Local Inference Engines

PyTorch

#text-to-image#diffusion-model#multimodal

MeiGen-AI/MultiTalk

Multimodal conversational video generation powered by AI, enabling new vibe-coder collaboration experiences.

2.8K

Stable

Python

LLM Frameworks

Agents & Orchestration

Python

#ai-powered#multimodal#conversational

vortex-data/vortex

An extensible, high-performance columnar file format for data storage and processing.

2.8K

Active

Rust

Databases

Rust

#compression#multimodal#array

rom1504/clip-retrieval

Easily compute CLIP embeddings and build a CLIP-based retrieval system with this Jupyter Notebook library.

2.7K

Stable

Jupyter Notebook

Computer Vision

Inference

Jupyter Notebook

#clip#retrieval#computer-vision

datachain-ai/datachain

Comprehensive analytics, versioning, and ETL toolkit for multimodal data (video, audio, PDFs, images)

2.7K

Active

Python

Computer Vision

ETL & Pipelines

Python

#data-analytics#data-wrangling#embeddings

AlexxIT/XiaomiGateway3

A custom component for controlling Xiaomi Multimode Gateway and Aqara Hub devices on Home Assistant.

2.7K

Stable

Python

API Frameworks

Authentication

Home Assistant

#aqara#zigbee#mesh

NVlabs/MUNIT

MUNIT is a deep learning-based method for multimodal unsupervised image-to-image translation, enabling vibe coders to create diverse and stylized images.

2.7K

Archived

Python

Computer Vision

Animation & Motion

Python

#deep-learning#gan#image-translation

1...35...9

Stay in the loop

Get weekly updates on trending AI coding tools and projects.