Abstract
Big data has a variety of data types, including image and text. In particular, image data-based research on face recognition and objection detection has been conducted in diverse areas. Deep learning needs a massive amount of data for learning a model accurately. The amount of data collected is different in each area, and thus it is likely to lack data for analysis through deep learning. Accordingly, it is necessary a method of learning a model effectively and predicting a result accurately with the use of a small amount of data. Also, captions and tags are generated to obtain image information. In the case of tagging, an image is expressed with words, while, in the case of captioning, a sentence can be created in connection with words. For this reason, it is possible to obtain image information in detail through captioning, compared to tagging. However, when a caption is created with words, there is the limitation of end-to-end to lower performance if labeled data are not sufficient. As a solution to the problem, meta-learning, in which a small amount of data can be used, is applied. This study proposes the captioning model based on meta-learning using prior-convergence knowledge for explainable images. The proposed method collects multimodal image data for predicting image information. From the collected data, the attributes representing object information and context information are used. After that, with the use of a small amount of data, meta-learning is applied in a bottom-up approach to creating a sentence for captioning. It can solve the problem caused by data shortage. Lastly, for the extraction of image features, LSTM for convolution network and captioning is established, and the basis for explanation is generated through the reverse operation. The generated basis is an image object. An appropriate explanation sentence is displayed in line with a particular object. Performance evaluation is conducted in two ways for accuracy. Firstly, BLEU score is evaluated according to whether there is meta-learning. Secondly, the proposed captioning model based on prior knowledge, RNN-based captioning model, and bidirectional RNN-based captioning model is evaluated in terms of BLEU score. Therefore, through the proposed method, LSTM’s bottom-up method reduces the cost of improving image resolution and solves the data shortage problem through meta-learning. In addition, it is possible to find the basis of image information using subtitles and to more accurately describe information about photos based on XAI.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.