kaldi-asr/kaldi is the official location of the Kaldi project.
-
Updated
Nov 25, 2020 - Shell
{{ message }}
kaldi-asr/kaldi is the official location of the Kaldi project.
Pre-trained and Reproduced Deep Learning Models (『飞桨』官方模型库,包含多种学术前沿和工业场景验证的深度学习模型)
Code examples for new APIs of iOS 10.
Lingvo
pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.
A TensorFlow Implementation of Tacotron: A Fully End-to-End Text-To-Speech Synthesis Model
WaveNet vocoder
DELTA is a deep learning based natural language and speech processing platform.
Python library and CLI tool to interface with Google Translate's text-to-speech API
Open-Source Large Vocabulary Continuous Speech Recognition Engine
A PaddlePaddle implementation of DeepSpeech2 architecture for ASR.
A TensorFlow Implementation of DC-TTS: yet another text-to-speech model
hi,
as you know, in SoLoud, the number of filters are limited
we should implement more like different reverbs, fir and irr filters, (these could be used to implement HRTF support), Chorus, One Poll, One Zero, Pole Zero, Two Pole, Two Zero, etc
a library exists called stk under zlib license which already implemented these maybe we can implement some of these out
A Python wrapper for Kaldi
Speech Enhancement Generative Adversarial Network in TensorFlow
Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
The neural network model is capable of detecting five different male/female emotions from audio speeches. (Deep Learning, NLP, Python)
Node.js client for Google Cloud Speech: Speech to text conversion powered by machine learning.
A neural network for end-to-end speech denoising
The J.A.R.V.I.S. Speech API is designed to be simple and efficient, using the speech engines created by Google to provide functionality for parts of the API. Essentially, it is an API written in Java, including a recognizer, synthesizer, and a microphone capture utility. The project uses Google services for the synthesizer and recognizer. While this requires an Internet connection, it provides a complete, modern, and fully functional speech API in Java.
We have to create contributing files for two repositories:
Also, please update cboard's contributing file, in order to:
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
Add a description, image, and links to the speech topic page so that developers can more easily learn about it.
To associate your repository with the speech topic, visit your repo's landing page and select "manage topics."
This issue is for reporting the error on documentation and offering the update.
If you find an error on documentation or contents hard to understand, feel free to add a comment. (Adding a suggestion would make it easier for the contributor to work on)
If you would like to work on the fix, feel free to open a PR and attach the screenshot of the documentation.
See examples: https://github.com/p