Multi-speaker tracking from an audio-visual sensing device

QIAN, X; Brutti, A; Lanz, O; Omologo, M; CAVALLARO, A

Repository landing page

oai:qmro.qmul.ac.uk:123456789/56779

Multi-speaker tracking from an audio-visual sensing device

Authors: X QIAN
A Brutti
O Lanz
M Omologo
A CAVALLARO
Publication date: 1 January 2019
Publisher: 'Institute of Electrical and Electronics Engineers (IEEE)'

Abstract

Compact multi-sensor platforms are portable and thus desirable for robotics and personal-assistance tasks. However, compared to physically distributed sensors, the size of these platforms makes person tracking more difficult. To address this challenge, we propose a novel 3D audio-visual people tracker that exploits visual observations (object detections) to guide the acoustic processing by constraining the acoustic likelihood on the horizontal plane defined by the predicted height of a speaker. This solution allows the tracker to estimate, with a small microphone array, the distance of a sound. Moreover, we apply a color-based visual likelihood on the image plane to compensate for misdetections. Finally, we use a 3D particle filter and greedy data association to combine visual observations, color-based and acoustic likelihoods to track the position of multiple simultaneous speakers. We compare the proposed multimodal 3D tracker against two state-of-the-art methods on the AV16.3 dataset and on a newly collected dataset with co-located sensors, which we make available to the research community. Experimental results show that our multimodal approach outperforms the other methods both in 3D and on the image plane

Article

Similar works

Full text

Open in the Core reader

Download PDF

Queen Mary Research Online

oai:qmro.qmul.ac.uk:123456789/...

Last time updated on 03/06/2019Provided by our Supporting member

This paper was published in Queen Mary Research Online.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.