Explore Projects

Discover 178 open source projects

Active filters (1):
Search: multimodalityร—
Clear all

Showing 61-80 of 178 projects

EvolvingLMMs-Lab/lmms-eval

A multimodal evaluation toolkit for assessing AI models across text, image, video, and audio tasks.

3.8K
Active
Python
LLM Frameworks
Agents & Orchestration
Python
#evaluation#multimodal#large-language-models

NVlabs/VILA

VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.

3.8K
Stable
Python
LLM Frameworks
Computer Vision
Python
#vision-language-model#multimodal-ai#edge-computing

microsoft/PhiCookBook

An open-source cookbook for getting started with Phi, a family of high-performance small language models from Microsoft.

3.7K
Active
Jupyter Notebook
LLM Frameworks
Books & Guides
#language-model#phi-models#small-language-model

NExT-GPT/NExT-GPT

Code and models for a multimodal large language model that can perform any-to-any tasks

3.6K
Experimental
Python
LLM Frameworks
Agents & Orchestration
PyTorch
#chatgpt#foundation-models#gpt-4

morphik-org/morphik-core

A comprehensive document search and storage platform for building AI applications using Python.

3.5K
Active
Python
LLM Frameworks
API Frameworks
Python
#artificial-intelligence#document-search#document-storage

OpenGVLab/InternGPT

InternGPT is an open-source demo platform that showcases various AI models, including DragGAN, ChatGPT, ImageBind, and multimodal chat.

3.2K
Archived
Python
LLM Frameworks
Agents & Orchestration
React
#chatgpt#draggan#imagebind

SkyworkAI/Skywork-R1V

An advanced multimodal AI model series for vision-language reasoning, developed by Skywork AI.

3.2K
Stable
Python
LLM Frameworks
Agents & Orchestration
Python
#multimodal#vision-language#reasoning

embeddings-benchmark/mteb

MTEB is a benchmark for evaluating and comparing text embedding models across multiple tasks and languages.

3.2K
Active
Python
LLM Wrappers & SDKs
Search
Python
#benchmark#text-embedding#multilingual-nlp

microsoft/torchscale

Foundation Architecture for (M)LLMs, a powerful toolkit for building large language models.

3.1K
Archived
Python
LLM Frameworks
API Frameworks
Python
#large-language-models#transformer#multimodal

ictnlp/LLaMA-Omni

A high-quality end-to-end speech interaction model for AI-powered voice applications.

3.1K
Experimental
Python
LLM Frameworks
AI Voice & Speech
Python
#large-language-model#speech-interaction#speech-to-speech

docarray/docarray

A Python library for representing, sending, storing, and searching multimodal data in AI and ML applications.

3.1K
Active
Python
LLM Frameworks
Vector Databases
PyTorch
#cross-modal#multimodal#neural-search

vllm-project/vllm-omni

A Python framework for efficient model inference with omni-modality AI models.

2.9K
Active
Python
Inference
Multimodal
PyTorch
#audio-generation#diffusion#image-generation

InternLM/InternLM-XComposer

A comprehensive multimodal system for long-term streaming video and audio interactions using large language models.

2.9K
Experimental
Python
LLM Frameworks
Computer Vision
PyTorch
#chatgpt#gpt-4#multimodal

Tencent-Hunyuan/HunyuanImage-3.0

Native multimodal model for high-quality image generation with text-to-image capabilities

2.9K
Active
Python
AI Image & Video
Local Inference Engines
PyTorch
#text-to-image#diffusion-model#multimodal

MeiGen-AI/MultiTalk

Multimodal conversational video generation powered by AI, enabling new vibe-coder collaboration experiences.

2.8K
Stable
Python
LLM Frameworks
Agents & Orchestration
Python
#ai-powered#multimodal#conversational

vortex-data/vortex

An extensible, high-performance columnar file format for data storage and processing.

2.8K
Active
Rust
Databases
Search
Rust
#compression#multimodal#array

rom1504/clip-retrieval

Easily compute CLIP embeddings and build a CLIP-based retrieval system with this Jupyter Notebook library.

2.7K
Stable
Jupyter Notebook
Computer Vision
Inference
Jupyter Notebook
#clip#retrieval#computer-vision

datachain-ai/datachain

Comprehensive analytics, versioning, and ETL toolkit for multimodal data (video, audio, PDFs, images)

2.7K
Active
Python
Computer Vision
ETL & Pipelines
Python
#data-analytics#data-wrangling#embeddings

AlexxIT/XiaomiGateway3

A custom component for controlling Xiaomi Multimode Gateway and Aqara Hub devices on Home Assistant.

2.7K
Stable
Python
API Frameworks
Authentication
Home Assistant
#aqara#zigbee#mesh

NVlabs/MUNIT

MUNIT is a deep learning-based method for multimodal unsupervised image-to-image translation, enabling vibe coders to create diverse and stylized images.

2.7K
Archived
Python
Computer Vision
Animation & Motion
Python
#deep-learning#gan#image-translation
1...35...9

Stay in the loop

Get weekly updates on trending AI coding tools and projects.