Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion

Kinnunen, T; Chng, E. S.; Li, Haizhou; Wu, Zhizheng

Repository landing page

oai:pure.ed.ac.uk:publications/c5945ecf-1f74-461f-9dc0-401aa28617eb

Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion

Authors: T Kinnunen
E. S. Chng
Haizhou Li
Zhizheng Wu
Publication date: 1 January 2010
Publisher

Abstract

In voice conversion, frame-level mean and variance normal- ization is typically used for fundamental frequency (F0) trans- formation, which is text-independent and requires no parallel training data. Some advanced methods transform pitch con- tours instead, but require either parallel training data or syllabic annotations. We propose a method which retains the simplic- ity and text-independence of the frame-level conversion while yielding high-quality conversion. We achieve these goals by (1) introducing a text-independent tri-frame alignment method, (2) including delta features of F0 into Gaussian mixture model (GMM) conversion and (3) reducing the well-known GMM oversmoothing effect by F0 histogram equalization. Our ob- jective and subjective experiments on the CMU Arctic corpus indicate improvements over both the mean/variance normaliza- tion and the baseline GMM conversion. Index Terms: Voice conversion, F0 transformation, GMM, his- togram equalization, text-independence <br/

contributionToPeriodical

Similar works

Full text

Open in the Core reader

Download PDF

Edinburgh Research Explorer

oai:pure.ed.ac.uk:publications...

Last time updated on 09/08/2016

This paper was published in Edinburgh Research Explorer.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.