Showing 1-20 of 36 projects
Deep learning paper implementations with side-by-side notes and explanations
Few-shot voice cloning and TTS with 1 min training data
A collection of PyTorch image encoders/backbones with training, evaluation, and inference scripts.
Voice conversion framework with web UI for training and real-time voice models
Singing Voice Conversion framework using AI
FishAudio-S1 is a high-quality open-source TTS model with voice cloning capabilities.
A deep learning model that converts images of mathematical equations into LaTeX code.
An offline-capable speech processing library for embedded systems, supporting a wide range of languages and platforms.
An open-source, large language model-based multimodal dialogue system that achieves near-GPT-4o performance.
Amphion is a toolkit for Audio, Music, and Speech Generation to support reproducible research.
A fork of the so-vits-svc project with realtime support, improved interface, and more features for AI-powered voice conversion.
Bert-VITS2 is a Python library that implements the VITS2 backbone with multilingual-BERT for speech synthesis and text-to-speech applications.
A PyTorch-based text-to-speech model that generates high-quality speech with expressive prosody.
Open-source toolkit for evaluating large multi-modal AI models, supporting 220+ models and 80+ benchmarks.
Vits-based audio colorization model for music
A fast and simple framework for building neural data processing pipelines using Python.
Quantized attention that achieves 2-5x speedup over FlashAttention for language, image, and video models.
A high-quality voice conversion tool focused on ease of use and performance for AI-powered audio applications.
Executable file for VITS inference, a neural text-to-speech model for generating high-quality speech.
Turn any computer or edge device into a command center for your computer vision projects.
Get weekly updates on trending AI coding tools and projects.