Explore Projects

Discover 178 open source projects

Active filters (1):
Search: multimodalร—
Clear all

Showing 41-60 of 178 projects

multimodal-art-projection/YuE

Open-source full-song music generation foundation model for developers building AI-powered audio applications.

6.1K
Experimental
Python
LLM Frameworks
Audio Generation
PyTorch
#music-generation#audio-generation#deep-learning

11cafe/jaaz

An open-source multimodal creative assistant that prioritizes privacy and can be used locally.

5.9K
Stable
TypeScript
AI Agents
Agents & Orchestration
React
#ai-agent#ai-image-generator#privacy-focused

om-ai-lab/VLM-R1

A Python-based library for solving visual understanding tasks using reinforced visual-linguistic models (VLMs).

5.9K
Stable
Python
LLM Frameworks
Computer Vision
Python
#deepseek-r1#multimodal#reinforcement-learning

ByteDance-Seed/Bagel

An open-source unified multimodal model for developers to build with AI tools

5.7K
Stable
Python
React
#multimodal model#AI coding tools#open-source

PySpur-Dev/pyspur

A visual playground for agentic workflows to iterate over agents 10x faster with AI tools and LLMs.

5.7K
Experimental
TypeScript
Agents & Orchestration
LLM Frameworks
React
#agents#llms#workflow

facebookresearch/mmf

A modular deep learning framework for multimodal AI research and applications from Facebook AI Research (FAIR).

5.6K
Active
Python
LLM Frameworks
Computer Vision
PyTorch
#deep-learning#multimodal#captioning

firebase/genkit

Open-source framework for building AI-powered apps in JavaScript, Go, and Python, used in production by Google.

5.6K
Active
TypeScript
LLM Frameworks
Agents & Orchestration
TypeScript
#ai#agents#llm

karpathy/neuraltalk

A Python library for learning Multimodal Recurrent Neural Networks that describe images with sentences.

5.5K
Archived
Python
Prompt Engineering
React
#multimodal#recurrent#neural networks

OpenBMB/UltraRAG

A low-code MCP framework for building complex and innovative RAG pipelines with AI tools.

5.4K
Active
Python
MCP Frameworks
RAG & Vector
Flask
#multimodal#low-code#rag

Eventual-Inc/Daft

High-performance data engine for AI and multimodal workloads, processing images, audio, video, and structured data at scale.

5.3K
Active
Rust
ML Ops
ETL & Pipelines
Rust
#ai-engineering#data-engineering#distributed

deepseek-ai/DeepSeek-VL2

A powerful mixture-of-experts vision-language model for advanced multimodal understanding.

5.2K
Experimental
Python
LLM Frameworks
Computer Vision
Python
#computer-vision#multimodal-ai#language-models

ParisNeo/lollms-webui

A web UI for interacting with large language models and multimodal AI systems.

4.8K
Active
CSS
LLM Frameworks
Frontend Frameworks
React
#large-language-models#multimodal-ai#web-ui

luban-agi/Awesome-AIGC-Tutorials

Curated tutorials and resources for Large Language Models, AI Painting, and more.

4.5K
Archived
LLM Frameworks
Computer Vision
#ai#llm#prompt-engineering

rom1504/img2dataset

Easily convert large sets of image URLs into a dataset for AI/ML training and experimentation.

4.4K
Stable
Python
Computer Vision
Databases
Python
#dataset#image-processing#big-data

fixie-ai/ultravox

A fast, multimodal LLM for real-time voice applications and AI-powered speech tools.

4.4K
Stable
Python
LLM Frameworks
AI Voice & Speech
Python
#llm#speech-recognition#text-to-speech

datawhalechina/all-in-rag

A comprehensive guide to using the RAG (Retrieval-Augmented Generation) technique for large language model applications.

4.3K
Active
Python
LLM Frameworks
RAG Frameworks
Python
#ai#langchain#llm

amazon-science/mm-cot

Official implementation of a paper on multimodal chain-of-thought reasoning in language models.

4.0K
Archived
Python
LLM Frameworks
Agents & Orchestration
Python
#language-models#multimodal#reasoning

QwenLM/Qwen2.5-Omni

An end-to-end multimodal AI model that can understand and generate text, audio, vision, and video in real-time.

3.9K
Experimental
Jupyter Notebook
LLM Frameworks
AI Voice & Speech
Jupyter Notebook
#multimodal#text-to-speech#speech-recognition

open-mmlab/mmpretrain

A pre-training toolbox and benchmark for vision AI models, including self-supervised learning and state-of-the-art architectures.

3.8K
Archived
Python
Computer Vision
ML Ops
PyTorch
#computer-vision#self-supervised-learning#pre-training

jina-ai/discoart

A Python library for creating Disco Diffusion artworks using a simple one-line interface.

3.8K
Archived
Python
AI Image & Video
Animation & Motion
#generative-art#disco-diffusion#prompt-engineering
124...9

Stay in the loop

Get weekly updates on trending AI coding tools and projects.