Self-supervised object detection from audio-visual correspondence

Afouras, T.; Asano, Y.M.; Fagan, F.; Vedaldi, A.; Metze, F.

Repository landing page

oai:dare.uva.nl:openaire_cris_publications/6874f5ad-b084-4bbb-b3c8-675faafd998a

Self-supervised object detection from audio-visual correspondence

Authors: T. Afouras
Y.M. Asano
F. Fagan
A. Vedaldi
F. Metze
Publication date: 1 January 2022
Publisher: IEEE Computer Society
Doi

Abstract

We tackle the problem of learning object detectors without supervision. Differently from weakly-supervised object detection, we do not assume image-level class labels. Instead, we extract a supervisory signal from audio-visual data, using the audio component to “teach” the object detector. While this problem is related to sound source localisation, it is considerably harder because the detector must classify the objects by type, enumerate each instance of the object, and do so even when the object is silent. We tackle this problem by first designing a self-supervised framework with a contrastive objective that jointly learns to classify and localise objects. Then, without using any supervision, we simply use these self-supervised labels and boxes to train an image-based object detector. With this, we outperform previous unsupervised and weakly-supervised detectors for the task of object detection and sound source localization. We also show that we can align this detector to ground-truth classes with as little as one label per pseudo-class, and show how our method can learn to detect generic objects that go beyond instruments, such as airplanes and cats

contributionToPeriodical

Similar works

Full text

Open in the Core reader

Download PDF

International Migration, Integration and Social Cohesion online publications

oai:dare.uva.nl:openaire_cris_...

Last time updated on 08/07/2023

This paper was published in International Migration, Integration and Social Cohesion online publications.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.