Showing 21-40 of 178 projects
A framework to enable multimodal AI models to control a computer, automating various tasks.
An open-source, large language model-based multimodal dialogue system that achieves near-GPT-4o performance.
An open-source recommender system engine that supports multimodal content via embedding for AI-focused developers.
Open-source embedded retrieval library for building multimodal AI search and recommendation systems.
A high-performance, distributed data integration tool for batch, streaming, and CDC use cases.
Unified, production-ready inference API to run open-source, speech, and multimodal models on cloud, on-prem, or your laptop.
ImageBind is a multimodal learning framework that learns a single embedding space to represent diverse modalities like images, text, and more.
BentoML is an easy-to-use framework for building and deploying production-ready machine learning models as APIs.
A powerful GUI agent family that enables mobile automation, multimodal interaction, and integration with AI tools.
An open-source, multi-purpose AI creation toolbox for text-to-image, image/video processing, and more.
Open-source multilingual multimodal chat language models for AI-powered chatbots and conversational agents.
A comprehensive reading list for research topics in multimodal machine learning.
Donut is an OCR-free Document Understanding Transformer and Synthetic Document Generator for computer vision and document AI tasks.
A state-of-the-art open visual language model for multimodal pretraining and applications.
An LLM-based multimodal agent framework designed to operate smartphone apps
This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)
An open-source AI agent platform for financial analysis using large language models (LLMs)
An open-source data format for building high-performance multimodal AI applications with fast random access, vector indexing, and data versioning.
A high-performance text-to-speech, speech-to-text, and speech-to-speech library for Apple Silicon devices.
An open-source Python tool to transform multimodal content into captivating multilingual audio podcasts powered by GenAI.
Get weekly updates on trending AI coding tools and projects.