Explore Projects

Discover 178 open source projects

Active filters (1):

Search: multimodal×

Clear all

Showing 21-40 of 178 projects

OthersideAI/self-operating-computer

A framework to enable multimodal AI models to control a computer, automating various tasks.

10.2K

Stable

Python

Agents & Orchestration

Python

#automation#openai#pyautogui

OpenGVLab/InternVL

An open-source, large language model-based multimodal dialogue system that achieves near-GPT-4o performance.

9.9K

Stable

Python

LLM Frameworks

Python

#gpt#llm#multimodal

gorse-io/gorse

An open-source recommender system engine that supports multimodal content via embedding for AI-focused developers.

9.4K

Active

Recommender System

#recommender-system#collaborative-filtering#knn

lancedb/lancedb

Open-source embedded retrieval library for building multimodal AI search and recommendation systems.

9.3K

Active

Rust

Similarity Search

Vector Databases

Rust

#approximate-nearest-neighbor-search#image-search#semantic-search

apache/seatunnel

A high-performance, distributed data integration tool for batch, streaming, and CDC use cases.

9.1K

Active

Java

ETL & Pipelines

Realtime

#data-integration#batch#streaming

xorbitsai/inference

Unified, production-ready inference API to run open-source, speech, and multimodal models on cloud, on-prem, or your laptop.

9.1K

Active

Python

LLM Frameworks

Inference

PyTorch

#artificial-intelligence#llm#inference

facebookresearch/ImageBind

ImageBind is a multimodal learning framework that learns a single embedding space to represent diverse modalities like images, text, and more.

9.0K

Stable

Python

LLM Frameworks

Computer Vision

PyTorch

#multimodal-learning#computer-vision#embeddings

bentoml/BentoML

BentoML is an easy-to-use framework for building and deploying production-ready machine learning models as APIs.

8.5K

Active

Python

LLM Frameworks

API Clients & Testing

Python

#ai-inference#llm-inference#llm-serving

X-PLUG/MobileAgent

A powerful GUI agent family that enables mobile automation, multimodal interaction, and integration with AI tools.

8.0K

Stable

Python

AI Coding Agents

Agents & Orchestration

Python

#agent#automation#multimodal

open-mmlab/mmagic

An open-source, multi-purpose AI creation toolbox for text-to-image, image/video processing, and more.

7.4K

Archived

Jupyter Notebook

Computer Vision

Generative AI

PyTorch

#text-to-image#image-generation#image-processing

zai-org/GLM-4

Open-source multilingual multimodal chat language models for AI-powered chatbots and conversational agents.

7.1K

Experimental

Python

LLM Frameworks

LLM Wrappers & SDKs

Python

#chatglm#llm#multimodal

pliang279/awesome-multimodal-ml

A comprehensive reading list for research topics in multimodal machine learning.

6.8K

Archived

Computer Vision

Natural Language Processing

#multimodal-learning#reading-list#machine-learning

clovaai/donut

Donut is an OCR-free Document Understanding Transformer and Synthetic Document Generator for computer vision and document AI tasks.

6.8K

Archived

Python

React

#document-ai#computer-vision#open-source

zai-org/CogVLM

A state-of-the-art open visual language model for multimodal pretraining and applications.

6.7K

Archived

Python

LLM Frameworks

Computer Vision

Python

#cross-modality#language-model#multi-modal

TencentQQGYLab/AppAgent

An LLM-based multimodal agent framework designed to operate smartphone apps

6.6K

Experimental

Python

Agents & Orchestration

LLM Frameworks

Python

#agent#chatgpt#generative-ai

SkalskiP/courses

This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)

6.4K

Archived

Python

LLM Frameworks

Computer Vision

Python

#artificial-intelligence#machine-learning#deep-learning

AI4Finance-Foundation/FinRobot

An open-source AI agent platform for financial analysis using large language models (LLMs)

6.3K

Active

Jupyter Notebook

LLM Frameworks

API Frameworks

Jupyter Notebook

#aiagent#chatgpt#finance

lance-format/lance

An open-source data format for building high-performance multimodal AI applications with fast random access, vector indexing, and data versioning.

6.1K

Active

Rust

LLM Frameworks

Databases

Rust

#data-format#data-versioning#vector-index

Blaizzy/mlx-audio

A high-performance text-to-speech, speech-to-text, and speech-to-speech library for Apple Silicon devices.

6.1K

Active

Python

AI Voice & Speech

CLI Tools

Apple MLX

#apple-silicon#speech-recognition#speech-synthesis

souzatharsis/podcastfy

An open-source Python tool to transform multimodal content into captivating multilingual audio podcasts powered by GenAI.

6.1K

Stable

Python

LLM Wrappers & SDKs

Audio & Speech

Python

#genai#audio-generation#podcast

13...9

Stay in the loop

Get weekly updates on trending AI coding tools and projects.