Showing 1-20 of 54 projects
๐ธTTS is a deep learning toolkit for advanced Text-to-Speech generation with 1100+ languages and tools for training and fine-tuning models.
A scalable generative AI framework for researchers and developers
An open-source speech recognition toolkit used for building speech recognition systems.
A comprehensive speech recognition toolkit with state-of-the-art pretrained models for various speech tasks.
Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Open-source speech toolkit with state-of-the-art ASR, TTS, translation, and audio processing capabilities.
An open-source project that connects the Xiaomi AI speaker to ChatGPT and Douban, turning it into a custom voice assistant.
A PyTorch-based toolkit for speech processing, including ASR, speaker recognition, and speech enhancement.
An offline-capable speech processing library for embedded systems, supporting a wide range of languages and platforms.
A privacy-focused, open-source AI meeting assistant with faster transcription, speaker diarization, and summarization, built on Rust.
A deep learning library for text-to-speech applications, focusing on generating high-quality speech from text.
End-to-end speech processing toolkit for tasks like speech recognition, synthesis, translation, and more.
An open-source Python project that allows playing music through a Xiaomi AI speaker using yt-dlp for downloading.
A neural network library for speaker diarization, including speech activity detection, speaker change detection, and speaker embedding.
EmotiVoice is a multi-voice and prompt-controlled TTS engine built with PyTorch for developers working with AI voice tools.
An open-source Chinese voice assistant project that supports ChatGPT-like conversational abilities and brain-computer interface integration.
A Python library that allows developers to interact with ChatGPT and other large language models using a Xiaomi AI speaker.
Automatic Speech Recognition with Speaker Diarization using OpenAI Whisper
A real-time state-of-the-art speech synthesis library for TensorFlow 2, supporting multiple languages.
An open-source toolkit for speech processing, supporting enhancement, separation, and target speaker extraction.
Get weekly updates on trending AI coding tools and projects.