phonemes

AI

End-to-End Adversarial Text-to-Speech

Text-to-speech engines are usually multi-stage pipelines that transform the signal into many intermediate representations and require supervision at each step. When trying to train TTS end-to-end, the alignment problem arises: Which text corresponds to which piece of sound? This paper uses an alignment module to tackle this problem and produces astonishingly good sound. Paper: https://arxiv.org/abs/2006.03575 […]

Read More