Repository landing page

We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.

Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion

Abstract

In voice conversion, frame-level mean and variance normal- ization is typically used for fundamental frequency (F0) trans- formation, which is text-independent and requires no parallel training data. Some advanced methods transform pitch con- tours instead, but require either parallel training data or syllabic annotations. We propose a method which retains the simplic- ity and text-independence of the frame-level conversion while yielding high-quality conversion. We achieve these goals by (1) introducing a text-independent tri-frame alignment method, (2) including delta features of F0 into Gaussian mixture model (GMM) conversion and (3) reducing the well-known GMM oversmoothing effect by F0 histogram equalization. Our ob- jective and subjective experiments on the CMU Arctic corpus indicate improvements over both the mean/variance normaliza- tion and the baseline GMM conversion. Index Terms: Voice conversion, F0 transformation, GMM, his- togram equalization, text-independence <br/

Similar works

This paper was published in Edinburgh Research Explorer.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.