Abstract

In the context of Natural Language Processing (NLP) tasks, problems such as insufficient or skewed data are frequently encountered. One practical solution to this problem is to generate additional textual data. Text Data Augmentation (TDA) refers to small changes made to accessible text at the character, word, or sentence level to generate synthetic data that is subsequently inserted into data loaders to train the model. By producing synthetic data, models can learn from a larger range of instances and, hence, enhance their resilience and generalization skills. Despite the fact that the entire NLP community has extensively studied many NLP DA approaches, recent research on the subject suggests that the relationship between the several DA techniques now in use is not entirely known in practice. Therefore, this study applies and extends the advances of TDA to encounter and cover varied tools on multiple settings or contexts. To carry out a thorough practical implementation of NLP DA approaches, comparing the way they perform and highlighting some of the significant similarities and differences in these various scenarios, this work depends on different tools of easy data augmentation and neural-based augmentation. This study suggests that some typical DA techniques might not be suitable in some circumstances or text environments. Specifically, according to the initial results, the context and word count of a text may have a significant impact on the quality of the synthetic data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.