Showing 61-80 of 368 projects
Unified, production-ready inference API to run open-source, speech, and multimodal models on cloud, on-prem, or your laptop.
Qwen3-TTS is an open-source series of TTS models that enable stable, expressive, and streaming speech generation.
Python speech recognition library supporting multiple engines and APIs, both online and offline.
A sound cloning tool that lets you use your voice or any sound to record audio, with a web interface.
An open-source library for running the Whisper AI speech recognition model efficiently on a variety of platforms.
ModelScope is an open-source AI framework that brings the notion of Model-as-a-Service to life, providing a comprehensive suite of tools for building, deploying, and managing AI models.
Bert-VITS2 is a Python library that implements the VITS2 backbone with multilingual-BERT for speech synthesis and text-to-speech applications.
A Jupyter Notebook project for zero-shot speech editing and text-to-speech using AI models.
EmotiVoice is a multi-voice and prompt-controlled TTS engine built with PyTorch for developers working with AI voice tools.
Silero VAD is a pre-trained enterprise-grade Voice Activity Detector library for Python.
A deep learning-based Chinese speech recognition system for developers working on AI-powered speech applications.
An open-source implementation of Microsoft's VALL-E X zero-shot text-to-speech model, enabling voice cloning and emotional speech synthesis.
A PyTorch-based text-to-speech model that generates high-quality speech with expressive prosody.
AI-powered wearable device that transcribes speech and summarizes conversations for developers.
A multilingual voice understanding model for AI-powered audio analysis and transcription.
Automagically synchronize subtitles with video using audio alignment and speech detection.
A simple native web interface for ChatTTS text-to-speech synthesis with API support.
A command-line translator using popular translation services like Google Translate and Bing Translator.
High-quality multi-lingual text-to-speech library supporting English, Spanish, French, Chinese, Japanese and Korean.
Zonos is an open-source, high-quality text-to-speech model for developers building AI-powered applications.
Get weekly updates on trending AI coding tools and projects.