Image-text Retrieval Research Articles

Automated recognition of handwritten characters and digits is a challenging task. Although a significant amount of literature exists for automatic recognition of handwritten characters of English and other major languages in the world, there exists a wide research gap due to lack of research for recognition of Urdu language. The variations in writing style, shape and size of individual characters and similarities with other characters add to the complexity for accurate classification of handwritten characters. Deep neural networks have emerged as a powerful technology for automated classification of character patters and object images. Although deep networks are known to provide remarkable results on large-scale datasets with millions of images, however the use of deep networks for small image datasets is still challenging. The purpose of this research is to present a classification framework for automatic recognition of handwritten Urdu character and digits with higher recognition accuracy by utilizing theory of transfer learning and pre-trained Convolution Neural Networks (CNN). The performance of transfer learning is evaluated in different ways: by using pre-trained AlexNet CNN model with Support Vector Machine (SVM) classifier, and fine-tuned AlexNet for extracting features and classification. We have fine-tuned AlexNet hyper-parameters to achieve higher accuracy and data augmentation is performed to avoid over-fitting. Experimental results and the quantitative comparisons demonstrate the effectiveness of the proposed research for recognition of handwritten characters and digits using fine-tuned AlexNet. The proposed research based on fine-tuned AlexNet outperforms the related state-of-the-art research thereby achieving a classification accuracy of 97.08%, 98.21%, 94.92% for urdu characters, digits and hybrid datasets respectively. The presented methods can be applied for research on Urdu characters and in diverse domains such as handwritten text image retrieval, reading postal addresses, bank’s cheque processing, preserving and digitization of manuscripts from old ages.

Cross-modal remote sensing text-image retrieval (RSCTIR) has recently become an urgent research hotspot due to its ability of enabling fast and flexible information extraction on remote sensing (RS) images. However, current RSCTIR methods mainly focus on global features of RS images, which leads to the neglect of local features that reflect target relationships and saliency. In this article, we first propose a novel RSCTIR framework based on global and local information (GaLR), and design a multi-level information dynamic fusion (MIDF) module to efficaciously integrate features of different levels. MIDF leverages local information to correct global information, utilizes global information to supplement local information, and uses the dynamic addition of the two to generate prominent visual representation. To alleviate the pressure of the redundant targets on the graph convolution network (GCN) and to improve the model s attention on salient instances during modeling local features, the de-noised representation matrix and the enhanced adjacency matrix (DREA) are devised to assist GCN in producing superior local representations. DREA not only filters out redundant features with high similarity, but also obtains more powerful local features by enhancing the features of prominent objects. Finally, to make full use of the information in the similarity matrix during inference, we come up with a plug-and-play multivariate rerank (MR) algorithm. The algorithm utilizes the k nearest neighbors of the retrieval results to perform a reverse search, and improves the performance by combining multiple components of bidirectional retrieval. Extensive experiments on public datasets strongly demonstrate the state-of-the-art performance of GaLR methods on the RSCTIR task. The code of GaLR method, MR algorithm, and corresponding files have been made available at https://github.com/xiaoyuan1996/GaLR .

Image-text Retrieval Research Articles

Related Topics

Articles published on Image-text Retrieval

VLDeformer: Vision–Language Decomposed Transformer for fast cross-modal retrieval

Comprehensive Framework of Early and Late Fusion for Image–Sentence Retrieval

Action-Aware Embedding Enhancement for Image-Text Retrieval

Playing Lottery Tickets with Vision and Language

Cross-modal alignment with graph reasoning for image-text retrieval

Cross-modal Graph Matching Network for Image-text Retrieval

Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown.

A Cross-Modal Image and Text Retrieval Method Based on Efficient Feature Extraction and Interactive Learning CAE

Handwritten Urdu Characters and Digits Recognition Using Transfer Learning and Augmentation With AlexNet

Fusion-Based Correlation Learning Model for Cross-Modal Remote Sensing Image Retrieval

Learning and Integrating Multi-Level Matching Features for Image-Text Retrieval

Asymmetry quantification in cross modal retrieval using copulas

Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information

A Lightweight Multi-Scale Crossmodal Text-Image Retrieval Method in Remote Sensing

Multilanguage Transformer for Improved Text to Remote Sensing Image Retrieval

RETRACTED ARTICLE: Simulation of cross-modal image-text retrieval algorithm under convolutional neural network structure and hash method

Text-Image Retrieval With Salient Features

On the Limitations of Visual-Semantic Embedding Networks for Image-to-Text Information Retrieval.

CMPD: Using Cross Memory Network With Pair Discrimination for Image-Text Retrieval

Multi-level Alignment Network for Domain Adaptive Cross-modal Retrieval

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Image-text Retrieval Research Articles

Related Topics

Articles published on Image-text Retrieval

VLDeformer: Vision–Language Decomposed Transformer for fast cross-modal retrieval

Comprehensive Framework of Early and Late Fusion for Image–Sentence Retrieval

Action-Aware Embedding Enhancement for Image-Text Retrieval

Playing Lottery Tickets with Vision and Language

Cross-modal alignment with graph reasoning for image-text retrieval

Cross-modal Graph Matching Network for Image-text Retrieval

Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown.

A Cross-Modal Image and Text Retrieval Method Based on Efficient Feature Extraction and Interactive Learning CAE

Handwritten Urdu Characters and Digits Recognition Using Transfer Learning and Augmentation With AlexNet

Fusion-Based Correlation Learning Model for Cross-Modal Remote Sensing Image Retrieval

Learning and Integrating Multi-Level Matching Features for Image-Text Retrieval

Asymmetry quantification in cross modal retrieval using copulas

Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information

A Lightweight Multi-Scale Crossmodal Text-Image Retrieval Method in Remote Sensing

Multilanguage Transformer for Improved Text to Remote Sensing Image Retrieval

RETRACTED ARTICLE: Simulation of cross-modal image-text retrieval algorithm under convolutional neural network structure and hash method

Text-Image Retrieval With Salient Features

On the Limitations of Visual-Semantic Embedding Networks for Image-to-Text Information Retrieval.

CMPD: Using Cross Memory Network With Pair Discrimination for Image-Text Retrieval

Multi-level Alignment Network for Domain Adaptive Cross-modal Retrieval