CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss

Srinivasa, Rakshith Sharma; Cho, Jaejin; Yang, Chouchang; Saidutta, Yashas Malur; Lee, Ching-Hua; Shen, Yilin; Jin, Hongxia

Computer Science > Machine Learning

arXiv:2309.14580 (cs)

[Submitted on 26 Sep 2023]

Title:CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss

Authors:Rakshith Sharma Srinivasa, Jaejin Cho, Chouchang Yang, Yashas Malur Saidutta, Ching-Hua Lee, Yilin Shen, Hongxia Jin

View PDF

Abstract:This paper considers contrastive training for cross-modal 0-shot transfer wherein a pre-trained model in one modality is used for representation learning in another domain using pairwise data. The learnt models in the latter domain can then be used for a diverse set of tasks in a zero-shot way, similar to ``Contrastive Language-Image Pre-training (CLIP)'' and ``Locked-image Tuning (LiT)'' that have recently gained considerable attention. Most existing works for cross-modal representation alignment (including CLIP and LiT) use the standard contrastive training objective, which employs sets of positive and negative examples to align similar and repel dissimilar training data samples. However, similarity amongst training examples has a more continuous nature, thus calling for a more `non-binary' treatment. To address this, we propose a novel loss function called Continuously Weighted Contrastive Loss (CWCL) that employs a continuous measure of similarity. With CWCL, we seek to align the embedding space of one modality with another. Owing to the continuous nature of similarity in the proposed loss function, these models outperform existing methods for 0-shot transfer across multiple models, datasets and modalities. Particularly, we consider the modality pairs of image-text and speech-text and our models achieve 5-8% (absolute) improvement over previous state-of-the-art methods in 0-shot image classification and 20-30% (absolute) improvement in 0-shot speech-to-intent classification and keyword classification.

Comments:	Accepted to Neural Information Processing Systems (NeurIPS) 2023 conference
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2309.14580 [cs.LG]
	(or arXiv:2309.14580v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2309.14580

Submission history

From: Rakshith Sharma Srinivasa [view email]
[v1] Tue, 26 Sep 2023 00:03:25 UTC (5,741 KB)

Computer Science > Machine Learning

Title:CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators