We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.
In voice conversion, frame-level mean and variance normal- ization is typically used for fundamental frequency (F0) trans- formation, which is text-independent and requires no parallel training data. Some advanced methods transform pitch con- tours instead, but require either parallel training data or syllabic annotations. We propose a method which retains the simplic- ity and text-independence of the frame-level conversion while yielding high-quality conversion. We achieve these goals by (1) introducing a text-independent tri-frame alignment method, (2) including delta features of F0 into Gaussian mixture model (GMM) conversion and (3) reducing the well-known GMM oversmoothing effect by F0 histogram equalization. Our ob- jective and subjective experiments on the CMU Arctic corpus indicate improvements over both the mean/variance normaliza- tion and the baseline GMM conversion. Index Terms: Voice conversion, F0 transformation, GMM, his- togram equalization, text-independence <br/
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.