Enhancing bidirectional association between deep image representations and loosely correlated texts

Qiuwen Chen, Qinru Qiu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The problem of bridging the gap between image and natural language has gained more and more attention in recent years. This paper continues to push the study and improves the bidirectional retrieval performance across the modalities. Unlike previous works that target at single sentence densely describing the image objects, we extend the focus to associating deep image representations with noisy texts that are only loosely correlated. Based on text-image fragment embedding, our model employs a sequential configuration, connects two embedding stages together. The first stage learns the relevancy of the text fragments, and the second stage uses the filtered output from the first one to improve the matching results. The model also integrates multiple convolutional neural networks (CNN) to construct the image fragments, in which rich context information such as human faces can be extracted to increase the alignment accuracy. The proposed method is evaluated with both synthetic dataset and real-world dataset collected from picture news website. The results show up to 50% ranking performance improvement over the comparison models.

Original languageEnglish (US)
Title of host publication2016 International Joint Conference on Neural Networks, IJCNN 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3164-3171
Number of pages8
Volume2016-October
ISBN (Electronic)9781509006199
DOIs
StatePublished - Oct 31 2016
Event2016 International Joint Conference on Neural Networks, IJCNN 2016 - Vancouver, Canada
Duration: Jul 24 2016Jul 29 2016

Other

Other2016 International Joint Conference on Neural Networks, IJCNN 2016
CountryCanada
CityVancouver
Period7/24/167/29/16

Fingerprint

Websites
Neural networks

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Cite this

Chen, Q., & Qiu, Q. (2016). Enhancing bidirectional association between deep image representations and loosely correlated texts. In 2016 International Joint Conference on Neural Networks, IJCNN 2016 (Vol. 2016-October, pp. 3164-3171). [7727603] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IJCNN.2016.7727603

Enhancing bidirectional association between deep image representations and loosely correlated texts. / Chen, Qiuwen; Qiu, Qinru.

2016 International Joint Conference on Neural Networks, IJCNN 2016. Vol. 2016-October Institute of Electrical and Electronics Engineers Inc., 2016. p. 3164-3171 7727603.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chen, Q & Qiu, Q 2016, Enhancing bidirectional association between deep image representations and loosely correlated texts. in 2016 International Joint Conference on Neural Networks, IJCNN 2016. vol. 2016-October, 7727603, Institute of Electrical and Electronics Engineers Inc., pp. 3164-3171, 2016 International Joint Conference on Neural Networks, IJCNN 2016, Vancouver, Canada, 7/24/16. https://doi.org/10.1109/IJCNN.2016.7727603
Chen Q, Qiu Q. Enhancing bidirectional association between deep image representations and loosely correlated texts. In 2016 International Joint Conference on Neural Networks, IJCNN 2016. Vol. 2016-October. Institute of Electrical and Electronics Engineers Inc. 2016. p. 3164-3171. 7727603 https://doi.org/10.1109/IJCNN.2016.7727603
Chen, Qiuwen ; Qiu, Qinru. / Enhancing bidirectional association between deep image representations and loosely correlated texts. 2016 International Joint Conference on Neural Networks, IJCNN 2016. Vol. 2016-October Institute of Electrical and Electronics Engineers Inc., 2016. pp. 3164-3171
@inproceedings{734451775786485bb48b57cbb3f083a0,
title = "Enhancing bidirectional association between deep image representations and loosely correlated texts",
abstract = "The problem of bridging the gap between image and natural language has gained more and more attention in recent years. This paper continues to push the study and improves the bidirectional retrieval performance across the modalities. Unlike previous works that target at single sentence densely describing the image objects, we extend the focus to associating deep image representations with noisy texts that are only loosely correlated. Based on text-image fragment embedding, our model employs a sequential configuration, connects two embedding stages together. The first stage learns the relevancy of the text fragments, and the second stage uses the filtered output from the first one to improve the matching results. The model also integrates multiple convolutional neural networks (CNN) to construct the image fragments, in which rich context information such as human faces can be extracted to increase the alignment accuracy. The proposed method is evaluated with both synthetic dataset and real-world dataset collected from picture news website. The results show up to 50{\%} ranking performance improvement over the comparison models.",
author = "Qiuwen Chen and Qinru Qiu",
year = "2016",
month = "10",
day = "31",
doi = "10.1109/IJCNN.2016.7727603",
language = "English (US)",
volume = "2016-October",
pages = "3164--3171",
booktitle = "2016 International Joint Conference on Neural Networks, IJCNN 2016",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Enhancing bidirectional association between deep image representations and loosely correlated texts

AU - Chen, Qiuwen

AU - Qiu, Qinru

PY - 2016/10/31

Y1 - 2016/10/31

N2 - The problem of bridging the gap between image and natural language has gained more and more attention in recent years. This paper continues to push the study and improves the bidirectional retrieval performance across the modalities. Unlike previous works that target at single sentence densely describing the image objects, we extend the focus to associating deep image representations with noisy texts that are only loosely correlated. Based on text-image fragment embedding, our model employs a sequential configuration, connects two embedding stages together. The first stage learns the relevancy of the text fragments, and the second stage uses the filtered output from the first one to improve the matching results. The model also integrates multiple convolutional neural networks (CNN) to construct the image fragments, in which rich context information such as human faces can be extracted to increase the alignment accuracy. The proposed method is evaluated with both synthetic dataset and real-world dataset collected from picture news website. The results show up to 50% ranking performance improvement over the comparison models.

AB - The problem of bridging the gap between image and natural language has gained more and more attention in recent years. This paper continues to push the study and improves the bidirectional retrieval performance across the modalities. Unlike previous works that target at single sentence densely describing the image objects, we extend the focus to associating deep image representations with noisy texts that are only loosely correlated. Based on text-image fragment embedding, our model employs a sequential configuration, connects two embedding stages together. The first stage learns the relevancy of the text fragments, and the second stage uses the filtered output from the first one to improve the matching results. The model also integrates multiple convolutional neural networks (CNN) to construct the image fragments, in which rich context information such as human faces can be extracted to increase the alignment accuracy. The proposed method is evaluated with both synthetic dataset and real-world dataset collected from picture news website. The results show up to 50% ranking performance improvement over the comparison models.

UR - http://www.scopus.com/inward/record.url?scp=85007236248&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85007236248&partnerID=8YFLogxK

U2 - 10.1109/IJCNN.2016.7727603

DO - 10.1109/IJCNN.2016.7727603

M3 - Conference contribution

AN - SCOPUS:85007236248

VL - 2016-October

SP - 3164

EP - 3171

BT - 2016 International Joint Conference on Neural Networks, IJCNN 2016

PB - Institute of Electrical and Electronics Engineers Inc.

ER -