Natural Language Research Articles

The need for automated image description systems has grown significantly with the rise of multimedia content across diverse domains such as social media, digital libraries, and accessibility technologies. Smart Caption presents a novel approach for generating intelligent and context-aware image descriptions by utilizing a combination of Vision Transformers (ViTs) and Natural Language Processing (NLP) Transformer decoders. Unlike traditional convolution-based methods, Vision Transformers treat images as sequences of patches, enabling them to capture global image features more effectively. These extracted visual features are then processed by a Transformer-based decoder to produce coherent and contextually appropriate captions, bridging the gap between image recognition and natural language generation. Our approach focuses on leveraging the attention mechanisms inherent in both Vision and NLP Transformers to enhance the quality of image descriptions. The ViT architecture is designed to focus on relevant regions within an image, while the NLP Transformer decoder is adept at generating fluent and detailed descriptions by attending to the most significant visual features. This two-stage process improves the precision of object detection, relationship understanding, and scene context in generated captions. Experimental results demonstrate that Smart Caption out performs traditional methods in terms of accuracy, contextual relevance, and fluency, marking a significant step forward in the field of automated image captioning. Through this research, we aim to provide a scalable, efficient, and intelligent solution for image description, with potential applications in areas such as accessibility tools for visually impaired users, digital content indexing, and automated content generation. Our findings highlight the strengths of transformer-based models in bridging vision and language tasks, offering promising directions for future research in multimodal AI..

Read full abstract

Achieving the Sustainable Development Goals (SDGs) requires collaboration among various stakeholders, particularly governments and non-state actors (NSAs). This collaboration results in but is also based on a continually growing volume of documents that needs to be analyzed and processed in a systematic way by government officials. Artificial Intelligence and Natural Language Processing (NLP) could, thus, offer valuable support for progressing towards SDG targets, including automating the government budget tagging and classifying NSA requests and initiatives, as well as helping uncover the possibilities for matching these two categories of activities. Many non-English speaking countries, including Indonesia, however, face limited NLP resources, such as, for instance, domain-specific pre-trained language models (PTLMs). This circumstance makes it difficult to automate document processing and improve the efficacy of SDG-related government efforts. The presented study introduces IndoGovBERT, a Bidirectional Encoder Representations from Transformers (BERT)-based PTLM built with domain-specific corpora, leveraging the Indonesian government’s public and internal documents. The model is intended to automate various laborious tasks of SDG document processing by the Indonesian government. Different approaches to PTLM development known from the literature are examined in the context of typical government settings. The most effective, in terms of the resultant model performance, but also most efficient, in terms of the computational resources required, methodology is determined and deployed for the development of the IndoGovBERT model. The developed model is then scrutinized in several text classification and similarity assessment experiments, where it is compared with four Indonesian general-purpose language models, a non-transformer approach of the Multilabel Topic Model (MLTM), as well as with a Multilingual BERT model. Results obtained in all experiments highlight the superior capability of the IndoGovBERT model for Indonesian government SDG document processing. The latter suggests that the proposed PTLM development methodology could be adopted to build high-performance specialized PTLMs for governments around the globe which face SDG document processing and other NLP challenges similar to the ones dealt with in the presented study.

Read full abstract

Natural Language Research Articles

Related Topics

Articles published on Natural Language

Smart Caption: Intelligent Image Description Using Transformer Decoder

Beyond Automation: A Rigorous Testing Framework for Reliable AI Chatbots in Life Insurance

Moving towards the use of artificial intelligence in pain management.

The Geometry and Dynamics of Meaning.

AI Virtual Assistants in Human Services: Empowering Customers and Caseworkers

Personalized Emotion Detection Adapting Models to Individual Emotional Expressions

DeepFake Image Detection: Fake Image Detection using CNNs and GANs Algorithm

Post-marketing surveillance of anticancer drugs using natural language processing of electronic medical records

Mini-mental status examination phenotyping for Alzheimer's disease patients using both structured and narrative electronic health record features.

The impact of AI-enhanced natural language processing tools on writing proficiency: an analysis of language precision, content summarization, and creative writing facilitation

IndoGovBERT: A Domain-Specific Language Model for Processing Indonesian Government SDG Documents

MeDi‐TODER: Medical Domain‐Incremental Task‐Oriented Dialogue Generator Using Experience Replay

Impacts of interacting with an AI chatbot on preservice teachers' responsive teaching skills in math education

Deep Learning for Text Sentiment Analysis: A Survey

Investigating Open Source LLMs to Retrofit Competency Questions in Ontology Engineering

AI-Driven Optimization of Financial Quantitative Trading Algorithms and Enhancement of Market Forecasting Capabilities

AI Use in Application Modernization for Banking Applications

A study of machine learning applications in healthcare

Integrating AI into Agile Workflows: Opportunities and Challenges

Quantum mechanics and statistical physics: Novel frameworks for enhancing natural language processing

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Natural Language Research Articles

Related Topics

Articles published on Natural Language

Smart Caption: Intelligent Image Description Using Transformer Decoder

Beyond Automation: A Rigorous Testing Framework for Reliable AI Chatbots in Life Insurance

Moving towards the use of artificial intelligence in pain management.

The Geometry and Dynamics of Meaning.

AI Virtual Assistants in Human Services: Empowering Customers and Caseworkers

Personalized Emotion Detection Adapting Models to Individual Emotional Expressions

DeepFake Image Detection: Fake Image Detection using CNNs and GANs Algorithm

Post-marketing surveillance of anticancer drugs using natural language processing of electronic medical records

Mini-mental status examination phenotyping for Alzheimer's disease patients using both structured and narrative electronic health record features.

The impact of AI-enhanced natural language processing tools on writing proficiency: an analysis of language precision, content summarization, and creative writing facilitation

IndoGovBERT: A Domain-Specific Language Model for Processing Indonesian Government SDG Documents

MeDi‐TODER: Medical Domain‐Incremental Task‐Oriented Dialogue Generator Using Experience Replay

Impacts of interacting with an AI chatbot on preservice teachers' responsive teaching skills in math education

Deep Learning for Text Sentiment Analysis: A Survey

Investigating Open Source LLMs to Retrofit Competency Questions in Ontology Engineering

AI-Driven Optimization of Financial Quantitative Trading Algorithms and Enhancement of Market Forecasting Capabilities

AI Use in Application Modernization for Banking Applications

A study of machine learning applications in healthcare

Integrating AI into Agile Workflows: Opportunities and Challenges

Quantum mechanics and statistical physics: Novel frameworks for enhancing natural language processing