Abstract

This study presents an innovative approach to enhance question-answering (QA) systems that utilize a RoBERTa-based architecture and complexity-enhanced input features. The work is divided into four primary parts: training methodology, feature engineering, building models, and data preprocessing. We propose a Python function that uses readability measures and natural language processing techniques to calculate the linguistic difficulty metrics for input sentences. TensorFlow Datasets (TFDS) are then used to load and preprocess the SquAD (Stanford Question Answering Dataset) dataset to enable effective training. Word embeddings from previously trained GloVe vectors are integrated with complexity metrics to prepare input features that add contextual information to the input representation. With the inclusion of the enriched features, a distinctive question-answering model based on the RoBERTa architecture is trained using the AdamW optimizer and CrossEntropyLoss. Iterative epochs are used in the training process to optimize the model's parameters and minimize the loss function. An independent validation dataset is used to evaluate the model's performance, proving the usefulness of the suggested method in improving the accuracy and robustness of the QA system. All in all, this work offers an organized strategy for improving the quality of systems by fusing cutting-edge neural architecture with input properties that are increased by complexity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call