Abstract
Content Based Image Retrieval, CBIR, is a highly active leading research field with numerous applications that are currently expanding beyond traditional CBIR methodologies. In this paper, a CBIR methodology is proposed to meet such demands. Query inputs of the proposed methodology are an image and a text. For instance, having an image, a user would like to obtain a similar one with some modification described in text format that we refer to as a text-modifier. The proposed methodology uses a set of neural networks that operate in feature space and perform feature composition in a uniform-known domain which is the textual feature domain. In this methodology, ResNet is used to extract image features and LSTM to extract text features to form query inputs. The proposed methodology uses a set of three single-hidden-layer non-linear feedforward networks in a cascading structure labeled NetA, NetC, and NetB. NetA maps image features into corresponding textual features. NetC composes the textual features produced by NetA with text-modifier features to form target image textual features. NetB maps target textual features to target image features that are used to recall the target image from the image-base based on cosine similarity. The proposed architecture was tested using ResNet 18, 50 and 152 for extracting image features. The testing results are promising and can compete with the most recent approaches to our knowledge as listed in section 5.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.