Speaker characterization by means of attention pooling

Costa, Federico; India Massana, Miquel Àngel; Hernando Pericás, Francisco Javier

Repository landing page

oai:upcommons.upc.edu:2117/384802

Speaker characterization by means of attention pooling

Authors: Federico Costa
Miquel Àngel India Massana
Francisco Javier Hernando Pericás
Publication date: 1 January 2022
Publisher: International Speech Communication Association (ISCA)
Doi

Abstract

State-of-the-art Deep Learning systems for speaker verification are commonly based on speaker embedding extractors. These architectures are usually composed of a feature extractor front-end together with a pooling layer to encode variable length utterances into fixed-length speaker vectors. The authors have recently proposed the use of a Double Multi-Head Self Attention pooling for speaker recognition, placed between a CNN-based front-end and a set of fully connected layers. This has shown to be an excellent approach to efficiently select the most relevant features captured by the front-end from the speech signal. In this paper we show excellent experimental results by adapting this architecture to other different speaker characterization tasks, such as emotion recognition, sex classification and COVID-19 detection.Peer ReviewedPostprint (published version

Similar works

Full text

Open in the Core reader

Download PDF

UPCommons. Portal del coneixement obert de la UPC

oai:upcommons.upc.edu:2117/384...

Last time updated on 21/03/2023

This paper was published in UPCommons. Portal del coneixement obert de la UPC.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.