Showing 141-160 of 178 projects
This repository provides an official implementation of a CVPR 2023 paper on handwriting generation using disentangled AI models.
Offline AI-powered inference engine for art, chatbots, and automated workflows focused on privacy and self-hosting
An efficient multimodal large language model with a small backbone for AI-powered coding tools.
Generates high-fidelity foley audio with multimodal diffusion and representation alignment.
A Python library for integrating visual intelligence into Home Assistant, a popular home automation platform.
A large-scale image-text dataset for training AI models, primarily focused on visual AI and multimodal AI tasks.
Multimodal AI toolkit for fast content understanding and generation across text, images, and video
Real-time voice interactive digital human with customizable appearance and voice, supporting voice cloning.
A multimodal framework for drug discovery and therapeutic science research.
A multimodal-driven architecture for customized video generation, enabling developers to create unique AI-powered videos.
A curated list of famous vision-language models and their architectures for developers working with AI tools.
An all-in-one data labeling and annotation platform for multimodal data training, supporting 3D LiDAR, images, and language models.
Kimi-VL is a multimodal AI model for advanced vision-language understanding and reasoning.
A curated collection of recent advances in vision-language pretrained models (VL-PTMs) for AI and multimodal applications.
A curated list of foundation models for vision and language tasks, useful for vibe coders building AI-powered applications.
The Alan AI SDK for Cordova provides a conversational AI interface for building voice-enabled apps.
A Python library that unifies 3D mesh generation with language models, enabling AI-driven 3D content creation.
A curated collection of awesome unified multimodal models for text-to-image generation and vision-language tasks.
A curated list of research papers on visual grounding, a key technique for multimodal AI.
Lhotse is a set of tools for handling multimodal data in machine learning projects, with a focus on speech and audio.
Get weekly updates on trending AI coding tools and projects.