AudioPairBank: towards a large-scale tag-pair-based audio content analysis

Sebastian Säger; Benjamin Elizalde; Damian Borth; Christian Schulze; Bhiksha Raj; Ian Lane

Repository landing page

oai:doaj.org/article:58ff7e5a039f4655b122e82aedbd8bf4

AudioPairBank: towards a large-scale tag-pair-based audio content analysis

Authors: Sebastian Säger
Benjamin Elizalde
Damian Borth
Christian Schulze
Bhiksha Raj
Ian Lane
Publication date: 1 September 2018
Publisher: 'Springer Science and Business Media LLC'
Doi

Abstract

Abstract Recently, sound recognition has been used to identify sounds, such as the sound of a car, or a river. However, sounds have nuances that may be better described by adjective-noun pairs such as “slow car” and verb-noun pairs such as “flying insects,” which are underexplored. Therefore, this work investigates the relationship between audio content and both adjective-noun pairs and verb-noun pairs. Due to the lack of datasets with these kinds of annotations, we collected and processed the AudioPairBank corpus consisting of a combined total of 1123 pairs and over 33,000 audio files. In this paper, we include previously unavailable documentation of the challenges and implications of collecting audio recordings with these types of labels. We have also shown the degree of correlation between the audio content and the labels through classification experiments, which yielded 70% accuracy. The results and study in this paper encourage further exploration of the nuances in sounds and are meant to complement similar research performed on images and text in multimedia analysis

Similar works

Full text

Directory of Open Access Journals

oai:doaj.org/article:58ff7e5a0...

Last time updated on 04/06/2019

This paper was published in Directory of Open Access Journals.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.