Explore Projects

Discover 178 open source projects

Active filters (1):
Search: multimodalityร—
Clear all

Showing 141-160 of 178 projects

dailenson/SDT

This repository provides an official implementation of a CVPR 2023 paper on handwriting generation using disentangled AI models.

1.3K
Experimental
Python
Computer Vision
Generative Models
PyTorch
#computer-vision#generative-models#multimodal

Capsize-Games/airunner

Offline AI-powered inference engine for art, chatbots, and automated workflows focused on privacy and self-hosting

1.3K
Stable
Python
Inference
AI Image & Video
PyGame
#ai#image-generation#chatbot

DLYuanGod/TinyGPT-V

An efficient multimodal large language model with a small backbone for AI-powered coding tools.

1.3K
Archived
Python
LLM Frameworks
AI Code Generation
PyTorch
#ai-tools#language-model#efficient-llm

Tencent-Hunyuan/HunyuanVideo-Foley

Generates high-fidelity foley audio with multimodal diffusion and representation alignment.

1.3K
Stable
Python
Prompt Engineering
React
#text-to-audio#diffusion#foley-sound-synthesis

valentinfrlch/ha-llmvision

A Python library for integrating visual intelligence into Home Assistant, a popular home automation platform.

1.3K
Active
Python
Computer Vision
Home Assistant
#home-assistant#computer-vision#multimodal

kakaobrain/coyo-dataset

A large-scale image-text dataset for training AI models, primarily focused on visual AI and multimodal AI tasks.

1.3K
Archived
Python
Computer Vision
Agents & Orchestration
#computer-vision#multimodal-ai#dataset

unum-cloud/UForm

Multimodal AI toolkit for fast content understanding and generation across text, images, and video

1.2K
Stable
Python
LLM Frameworks
Computer Vision
PyTorch
#multimodal-ai#cross-modal#semantic-search

Henry-23/VideoChat

Real-time voice interactive digital human with customizable appearance and voice, supporting voice cloning.

1.2K
Stable
Python
React
#authentication#streaming#real-time

mims-harvard/TDC

A multimodal framework for drug discovery and therapeutic science research.

1.2K
Experimental
Jupyter Notebook
LLM Frameworks
Databases
#bioinformatics#cheminformatics#drug-discovery

Tencent-Hunyuan/HunyuanCustom

A multimodal-driven architecture for customized video generation, enabling developers to create unique AI-powered videos.

1.2K
Stable
Python
AI Image & Video
AI SDKs & Wrappers
Python
#audio-driven#image-to-video#video-generation

gokayfem/awesome-vlm-architectures

A curated list of famous vision-language models and their architectures for developers working with AI tools.

1.2K
Active
Markdown
LLM Frameworks
Frontend Frameworks
React
#vision-language-models#multimodal#llm

xtreme1-io/xtreme1

An all-in-one data labeling and annotation platform for multimodal data training, supporting 3D LiDAR, images, and language models.

1.2K
Experimental
TypeScript
Computer Vision
Inference
TypeScript
#3d-annotation#annotation-tool#lidar-annotation

MoonshotAI/Kimi-VL

Kimi-VL is a multimodal AI model for advanced vision-language understanding and reasoning.

1.2K
Experimental
LLM Frameworks
Agents & Orchestration
#multimodal-ai#vision-language#reasoning

yuewang-cuhk/awesome-vision-language-pretraining-papers

A curated collection of recent advances in vision-language pretrained models (VL-PTMs) for AI and multimodal applications.

1.2K
Archived
Computer Vision
LLM Frameworks
#vision-language#multimodal#pretrained-models

uncbiag/Awesome-Foundation-Models

A curated list of foundation models for vision and language tasks, useful for vibe coders building AI-powered applications.

1.1K
Experimental
LLM Frameworks
Multimodal Models
#foundation-models#large-language-models#multimodal-models

alan-ai/alan-sdk-cordova

The Alan AI SDK for Cordova provides a conversational AI interface for building voice-enabled apps.

1.1K
Experimental
Ruby
AI Voice & Speech
Component Libraries (React)
React
#chatbot#conversational-ai#speech-recognition

nv-tlabs/LLaMA-Mesh

A Python library that unifies 3D mesh generation with language models, enabling AI-driven 3D content creation.

1.1K
Experimental
Python
LLM Frameworks
Computer Vision
PyTorch
#3d-generation#mesh-generation#multimodal

AIDC-AI/Awesome-Unified-Multimodal-Models

A curated collection of awesome unified multimodal models for text-to-image generation and vision-language tasks.

1.1K
Active
LLM Frameworks
Computer Vision
#multimodal-models#text-to-image-generation#vision-language-model

TheShadow29/awesome-grounding

A curated list of research papers on visual grounding, a key technique for multimodal AI.

1.1K
Stable
Computer Vision
Language Grounding
#computer-vision#language-grounding#multimodal-ai

lhotse-speech/lhotse

Lhotse is a set of tools for handling multimodal data in machine learning projects, with a focus on speech and audio.

1.1K
Active
Python
Speech & Voice
Data Pipelines
PyTorch
#speech-recognition#audio-processing#data-handling
1...79

Stay in the loop

Get weekly updates on trending AI coding tools and projects.