Abstract:In this paper, we propose a model called IRMatch for matching images and sentences based on implication relation to solve the nonequivalent semantics matching problem between images and sentences. The IRMatch model first maps images and sentences to a common semantic space respectively by using convolutional neural networks, and then mines implication relations between images and sentences with a learning algorithm by introducing maximum soft margin strategies, which strengthens the proximity of locations of related images and sentences in the common semantic space and improves the reasonability of matching scores between images and sentences. Based on the IRMatch model, we realize approaches of bidirectional image and sentence retrieval, and compare them with approaches using existing models for matching images and sentences on datasets Flickr8k, Flickr30k and Microsoft COCO. Experimental results show that our retrieval approaches perform better in terms of R@1, R@5, R@10 and Med r on the three datasets.