Showing 1-17 of 17 projects
A PyTorch-based toolkit for speech processing, including ASR, speaker recognition, and speech enhancement.
A neural network library for speaker diarization, including speech activity detection, speaker change detection, and speaker embedding.
Silero VAD is a pre-trained enterprise-grade Voice Activity Detector library for Python.
A comprehensive reading list for research topics in multimodal machine learning.
Foundation Architecture for (M)LLMs, a powerful toolkit for building large language models.
An open-source library for multilingual automatic speech recognition with word-level timestamps and confidence.
A high-quality open-source PyTorch implementation of the WaveNet vocoder, a neural network for speech synthesis.
An AI-powered speech denoising and enhancement library for improving audio quality.
A fast and controllable text-to-speech library supporting over 7000 languages using deep learning and PyTorch.
A lightweight, high-performance voice activity detector (VAD) library for conversational AI and real-time speech processing.
A PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models.
A curated list of resources for speaker diarization, a speech processing task to identify who spoke when.
A collection of open-source speech corpora for building speech recognition, synthesis, and other audio applications.
A Python library for speech restoration, including tasks like declipping, denoising, and dereverberation.
An all-in-one model for offline and simultaneous speech recognition, translation, and synthesis.
SincNet is a neural architecture for efficiently processing raw audio samples for speech and audio processing tasks.
Open-source audio annotation tool for machine learning and speech processing datasets.
Get weekly updates on trending AI coding tools and projects.