Showing 1-9 of 9 projects
Text-to-audio model for generating realistic speech and sounds
Amphion is a toolkit for Audio, Music, and Speech Generation to support reproducible research.
A Python library for text-to-audio/music generation using diffusion models.
Generates high-fidelity foley audio with multimodal diffusion and representation alignment.
An all-in-one model for offline and simultaneous speech recognition, translation, and synthesis.
An all-in-one web UI for different audio-related neural networks, including text-to-speech, voice cloning, and generative music.
A family of diffusion models for text-to-audio generation, targeting vibe coders who build with AI tools.
A PyTorch framework for generating audio from any modality, guided by Chain-of-Thought reasoning.
Efficient and high-quality text-to-audio generation with a latent consistency model.
Get weekly updates on trending AI coding tools and projects.