Explore Projects

Discover 178 open source projects

Active filters (1):
Search: multimodalityร—
Clear all

Showing 21-40 of 178 projects

OthersideAI/self-operating-computer

A framework to enable multimodal AI models to control a computer, automating various tasks.

10.2K
Stable
Python
Agents & Orchestration
Python
#automation#openai#pyautogui

OpenGVLab/InternVL

An open-source, large language model-based multimodal dialogue system that achieves near-GPT-4o performance.

9.9K
Stable
Python
LLM Frameworks
Python
#gpt#llm#multimodal

gorse-io/gorse

An open-source recommender system engine that supports multimodal content via embedding for AI-focused developers.

9.4K
Active
Go
Recommender System
#recommender-system#collaborative-filtering#knn

lancedb/lancedb

Open-source embedded retrieval library for building multimodal AI search and recommendation systems.

9.3K
Active
Rust
Similarity Search
Vector Databases
Rust
#approximate-nearest-neighbor-search#image-search#semantic-search

apache/seatunnel

A high-performance, distributed data integration tool for batch, streaming, and CDC use cases.

9.1K
Active
Java
ETL & Pipelines
Realtime
#data-integration#batch#streaming

xorbitsai/inference

Unified, production-ready inference API to run open-source, speech, and multimodal models on cloud, on-prem, or your laptop.

9.1K
Active
Python
LLM Frameworks
Inference
PyTorch
#artificial-intelligence#llm#inference

facebookresearch/ImageBind

ImageBind is a multimodal learning framework that learns a single embedding space to represent diverse modalities like images, text, and more.

9.0K
Stable
Python
LLM Frameworks
Computer Vision
PyTorch
#multimodal-learning#computer-vision#embeddings

bentoml/BentoML

BentoML is an easy-to-use framework for building and deploying production-ready machine learning models as APIs.

8.5K
Active
Python
LLM Frameworks
API Clients & Testing
Python
#ai-inference#llm-inference#llm-serving

X-PLUG/MobileAgent

A powerful GUI agent family that enables mobile automation, multimodal interaction, and integration with AI tools.

8.0K
Stable
Python
AI Coding Agents
Agents & Orchestration
Python
#agent#automation#multimodal

open-mmlab/mmagic

An open-source, multi-purpose AI creation toolbox for text-to-image, image/video processing, and more.

7.4K
Archived
Jupyter Notebook
Computer Vision
Generative AI
PyTorch
#text-to-image#image-generation#image-processing

zai-org/GLM-4

Open-source multilingual multimodal chat language models for AI-powered chatbots and conversational agents.

7.1K
Experimental
Python
LLM Frameworks
LLM Wrappers & SDKs
Python
#chatglm#llm#multimodal

pliang279/awesome-multimodal-ml

A comprehensive reading list for research topics in multimodal machine learning.

6.8K
Archived
Computer Vision
Natural Language Processing
#multimodal-learning#reading-list#machine-learning

clovaai/donut

Donut is an OCR-free Document Understanding Transformer and Synthetic Document Generator for computer vision and document AI tasks.

6.8K
Archived
Python
React
#document-ai#computer-vision#open-source

zai-org/CogVLM

A state-of-the-art open visual language model for multimodal pretraining and applications.

6.7K
Archived
Python
LLM Frameworks
Computer Vision
Python
#cross-modality#language-model#multi-modal

TencentQQGYLab/AppAgent

An LLM-based multimodal agent framework designed to operate smartphone apps

6.6K
Experimental
Python
Agents & Orchestration
LLM Frameworks
Python
#agent#chatgpt#generative-ai

SkalskiP/courses

This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)

6.4K
Archived
Python
LLM Frameworks
Computer Vision
Python
#artificial-intelligence#machine-learning#deep-learning

AI4Finance-Foundation/FinRobot

An open-source AI agent platform for financial analysis using large language models (LLMs)

6.3K
Active
Jupyter Notebook
LLM Frameworks
API Frameworks
Jupyter Notebook
#aiagent#chatgpt#finance

lance-format/lance

An open-source data format for building high-performance multimodal AI applications with fast random access, vector indexing, and data versioning.

6.1K
Active
Rust
LLM Frameworks
Databases
Rust
#data-format#data-versioning#vector-index

Blaizzy/mlx-audio

A high-performance text-to-speech, speech-to-text, and speech-to-speech library for Apple Silicon devices.

6.1K
Active
Python
AI Voice & Speech
CLI Tools
Apple MLX
#apple-silicon#speech-recognition#speech-synthesis

souzatharsis/podcastfy

An open-source Python tool to transform multimodal content into captivating multilingual audio podcasts powered by GenAI.

6.1K
Stable
Python
LLM Wrappers & SDKs
Audio & Speech
Python
#genai#audio-generation#podcast
13...9

Stay in the loop

Get weekly updates on trending AI coding tools and projects.