Exemplar-based speech waveform generation for text-to-speech

Valentini Botinhao, Cassia; Watts, Oliver; Espic Calderón, Felipe; King, Simon

Repository landing page

oai:pure.ed.ac.uk:publications/448b7f86-60df-43d2-9d8a-a66ad8856a4a

Exemplar-based speech waveform generation for text-to-speech

Authors: Cassia Valentini Botinhao
Oliver Watts
Felipe Espic Calderón
Simon King
Publication date: 14 February 2019
Publisher: Institute of Electrical and Electronics Engineers (IEEE)
Doi

Abstract

This paper presents a hybrid text-to-speech framework that uses a waveform generation method based on examplars of natural speech waveform. These examplars are selected at synthesis time given a sequence of acoustic features generated from text by a statistical parametric speech synthesis model. In order to match the expected degradation of these target synthesis features, the database of units is constructed such that the units’ target representations are generated from the same parametric model. We evaluate two variants of this framework by modifying the size of the examplar: a small unit variant (where unit boundaries are determined by pitch mark location) and a halfphone variant (where unit boundaries are determined by subphone state forced alignment). We found that for a larger dataset (around four hours of training data) the examplar-based waveform generation variants are rated higher than the vocoder-based system

Similar works

Full text

Open in the Core reader

Download PDF

Edinburgh Research Explorer

oai:pure.ed.ac.uk:publications...

Last time updated on 16/08/2019

This paper was published in Edinburgh Research Explorer.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.