Explore Projects

Discover 53 open source projects

Active filters (1):

Search: multi-modality×

Showing 21-40 of 53 projects

ThilinaRajapakse/simpletransformers

A Python library that provides a simple interface for using popular NLP models like BERT, GPT-2, XLNet, and T5 for various tasks.

4.2K

Stable

Python

LLM Frameworks

Text Classification

Python

#transformers#natural-language-processing#text-classification

open-compass/VLMEvalKit

Open-source toolkit for evaluating large multi-modal AI models, supporting 220+ models and 80+ benchmarks.

3.9K

Active

Python

LLM Frameworks

LLM Wrappers & SDKs

PyTorch

#chatgpt#llm#multi-modal

PKU-YuanGroup/Video-LLaVA

A large-scale vision-language model for video understanding and generation.

3.5K

Archived

Python

LLM Frameworks

Computer Vision

Python

#large-vision-language-model#video-understanding#multi-modal

JIA-Lab-research/MGM

An official repository for the 'Mini-Gemini' model, a multi-modal vision-language model for generation tasks.

3.3K

Archived

Python

LLM Frameworks

Computer Vision

Python

#generation#large-language-models#vision-language-model

aurelio-labs/semantic-router

A Python library that provides superfast AI decision-making and intelligent multi-modal data processing.

3.3K

Stable

Python

LLM Frameworks

Agents & Orchestration

#ai#artificial-intelligence#computer-vision

docarray/docarray

A Python library for representing, sending, storing, and searching multimodal data in AI and ML applications.

3.1K

Active

Python

LLM Frameworks

Vector Databases

PyTorch

#cross-modal#multimodal#neural-search

InternLM/InternLM-XComposer

A comprehensive multimodal system for long-term streaming video and audio interactions using large language models.

2.9K

Experimental

Python

LLM Frameworks

Computer Vision

PyTorch

#chatgpt#gpt-4#multimodal

modelscope/3D-Speaker

A library for single- and multi-modal speaker verification, recognition, and diarization.

2.8K

Stable

Python

Computer Vision

AI Voice & Speech

Python

#speaker-verification#speaker-recognition#speaker-diarization

JIA-Lab-research/LISA

Project that provides a reasoning-based segmentation method using large language models.

2.6K

Experimental

Python

LLM Frameworks

Agents & Orchestration

Python

#large-language-model#llm#multi-modal

opentripplanner/OpenTripPlanner

An open source multi-modal trip planner for developers to build with AI tools.

2.6K

Active

Java

React

#trip-planner#AI-powered#open-source

X-PLUG/mPLUG-Owl

A powerful multi-modal large language model family for building advanced AI chatbots and visual recognition models.

2.5K

Experimental

Python

LLM Frameworks

Computer Vision

PyTorch

#chatbot#gpt#multimodal

zai-org/CogVLM2

An open-source multi-modal AI model based on LLaMA-3.8B for vibe coders to build with AI tools.

2.4K

Experimental

Python

LLM Frameworks

Computer Vision

Python

#open-source#multi-modal#language-model

Yuliang-Liu/Monkey

A Python library for working with large multi-modal models, focusing on image resolution and text labeling.

1.9K

Active

Python

Computer Vision

ML Ops

#computer-vision#multi-modal-models#image-resolution

black0017/MedicalZooPytorch

A PyTorch-based deep learning framework for 2D/3D medical image segmentation.

1.9K

Archived

Python

Computer Vision

API Frameworks

PyTorch

#medical-imaging#image-segmentation#deep-learning

OpenMotionLab/MotionGPT

MotionGPT is a unified motion-language generation model that can generate human motion using large language models.

1.9K

Experimental

Python

LLM Frameworks

Motion Generation

Python

#motion-generation#text-to-motion#chatgpt

Kav-K/GPTDiscord

A robust, all-in-one GPT interface for Discord with chatbot, image generation, moderation, and more

1.8K

Archived

Python

LLM Wrappers & SDKs

Authentication

asyncio

#chatbot#dalle2#discord-bot

IntelLabs/fastRAG

Efficient retrieval augmentation and generation framework for multi-modal information retrieval and question-answering

1.8K

Active

Python

LLM Frameworks

RAG Frameworks

PyTorch

#information-retrieval#question-answering#multi-modal

dingodb/dingo

A high-performance, MySQL-compatible vector database that supports structured and unstructured data for AI-driven applications.

1.7K

Active

Java

Vector Databases

API Frameworks

#vector-database#mysql-compatibility#structured-data

ByteDance-Seed/VeOmni

VeOmni is a scalable distributed training framework for multi-modal AI models, focused on vibe coders.

1.7K

Active

Python

MCP Frameworks

MLOps

Python

#distributed-training#multi-modal-ai#model-centric

lyuchenyang/Macaw-LLM

Macaw-LLM is a multi-modal language modeling framework that integrates image, video, audio, and text data.

1.6K

Archived

Python

LLM Frameworks

Computer Vision

PyTorch

#multi-modal-learning#deep-learning#natural-language-processing

Stay in the loop

Get weekly updates on trending AI coding tools and projects.