Combining vocal tract length normalization with hierarchial linear transformations

Saheer, L.; Yamagishi, J.; Garner, P.N.; Dines, J.

Repository landing page

research

oai:pure.ed.ac.uk:publications/6c7fe616-ae42-49da-9adc-28ba886fc09a

Combining vocal tract length normalization with hierarchial linear transformations

Authors: L. Saheer
J. Yamagishi
P.N. Garner
J. Dines
Publication date: 1 March 2012
Publisher
Doi

Abstract

Recent research has demonstrated the effectiveness of vocal tract length normalization (VTLN) as a rapid adaptation technique for statistical parametric speech synthesis. VTLN produces speech with naturalness preferable to that of MLLR-based adaptation techniques, being much closer in quality to that generated by the original average voice model. However with only a single parameter, VTLN captures very few speaker specific characteristics when compared to linear transform based adaptation techniques. This paper proposes that the merits of VTLN can be combined with those of linear transform based adaptation in a hierarchial Bayesian framework, where VTLN is used as the prior information. A novel technique for propagating the gender information from the VTLN prior through constrained structural maximum a posteriori linear regression (CSMAPLR) adaptation is presented. Experiments show that the resulting transformation has improved speech quality with better naturalness, intelligibility and improved speaker similarity

Similar works

Full text

Open in the Core reader

Download PDF

Edinburgh Research Explorer

oai:pure.ed.ac.uk:publications...

Last time updated on 08/02/2015

This paper was published in Edinburgh Research Explorer.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.