Enhancing bidirectional association between deep image representations and loosely correlated texts

Qiuwen Chen, Qinru Qiu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

The problem of bridging the gap between image and natural language has gained more and more attention in recent years. This paper continues to push the study and improves the bidirectional retrieval performance across the modalities. Unlike previous works that target at single sentence densely describing the image objects, we extend the focus to associating deep image representations with noisy texts that are only loosely correlated. Based on text-image fragment embedding, our model employs a sequential configuration, connects two embedding stages together. The first stage learns the relevancy of the text fragments, and the second stage uses the filtered output from the first one to improve the matching results. The model also integrates multiple convolutional neural networks (CNN) to construct the image fragments, in which rich context information such as human faces can be extracted to increase the alignment accuracy. The proposed method is evaluated with both synthetic dataset and real-world dataset collected from picture news website. The results show up to 50% ranking performance improvement over the comparison models.

Original languageEnglish (US)
Title of host publication2016 International Joint Conference on Neural Networks, IJCNN 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3164-3171
Number of pages8
Volume2016-October
ISBN (Electronic)9781509006199
DOIs
StatePublished - Oct 31 2016
Event2016 International Joint Conference on Neural Networks, IJCNN 2016 - Vancouver, Canada
Duration: Jul 24 2016Jul 29 2016

Other

Other2016 International Joint Conference on Neural Networks, IJCNN 2016
CountryCanada
CityVancouver
Period7/24/167/29/16

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Enhancing bidirectional association between deep image representations and loosely correlated texts'. Together they form a unique fingerprint.

  • Cite this

    Chen, Q., & Qiu, Q. (2016). Enhancing bidirectional association between deep image representations and loosely correlated texts. In 2016 International Joint Conference on Neural Networks, IJCNN 2016 (Vol. 2016-October, pp. 3164-3171). [7727603] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IJCNN.2016.7727603