Repository logo
 

A Log Domain Pulse Model for Parametric Speech Synthesis

Accepted version
Peer-reviewed

Type

Article

Change log

Authors

Degottex, Gilles 
Lanchantin, Pierre 

Abstract

Most of the degradation in current Statistical Parametric Speech Synthesis (SPSS) results from the form of the vocoder. One of the main causes of degradation is the reconstruction of the noise. In this article, a new signal model is proposed that leads to a simple synthesizer, without the need for ad-hoc tuning of model parameters. The model is not based on the traditional additive linear source-filter model, it adopts a combination of speech components that are additive in the log domain. Also, the same representation for voiced and unvoiced segments is used, rather than relying on binary voicing decisions. This avoids voicing error discontinuities that can occur in many current vocoders. A simple binary mask is used to denote the presence of noise in the time-frequency domain, which is less sensitive to classification errors. Four experiments have been carried out to evaluate this new model. The first experiment examines the noise reconstruction issue. Three listening tests have also been carried out that demonstrate the advantages of this model: comparison with the STRAIGHT vocoder; the direct prediction of the binary noise mask by using a mixed output configuration; and partial improvements of creakiness using a mask correction mechanism.

Description

Keywords

speech, speech processing, speech synthesis, text-to-speech, parametric speech synthesis, acoustic model, voice, pulse model

Journal Title

IEEE/ACM Transactions on Audio, Speech, and Language Processing

Conference Name

Journal ISSN

2329-9290
2329-9304

Volume Title

26

Publisher

Institute of Electrical and Electronics Engineers (IEEE)
Sponsorship
European Commission Horizon 2020 (H2020) Marie Sk?odowska-Curie actions (655764)
European Union's Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie; 10.13039/501100000266-EPSRC