Abstract
Context:Utilizing pretrained language models (PLMs) has become common practice in maintaining the content quality of question-answering (Q&A) websites. However, evaluating the effectiveness of PLMs poses a challenge as they tend to provide local optima rather than global optima. Objective:In this study, we propose using semantic-preserving Metamorphic Relations (MRs) derived from Metamorphic Testing (MT) to address this challenge and validate PLMs. Methods:To validate four selected PLMs, we conducted an empirical experiment using a publicly available dataset comprising 60000 data points. We defined three groups of Metamorphic Relations (MRGs), consisting of thirteen semantic-preserving MRs, which were then employed to generate “Follow-up” testing datasets based on the original “Source” testing datasets. The PLMs were trained using a separate training dataset. A comparison was made between the predictions of the four trained PLMs for “Source” and “Follow-up” testing datasets in order to identify instances of violations, which corresponded to inconsistent predictions between the two datasets. If no violation was found, it indicated that the PLM was insensitive to the associate MR; thereby, the MR can be used for validation. In cases where no violation occurred across the entire MRG, non-violation regions were identified and supported simulation metamorphic testing. Results:The results of this study demonstrated that the proposed MRs could effectively serve as a validation tool for content quality classification on Stack Overflow Q&A using PLMs. One PLM did not violate the “Uppercase conversion” MRG and the “Duplication” MRG. Furthermore, the absence of violations in the MRGs allowed for the identification of non-violation regions, confirming the ability of the proposed MRs to support simulation metamorphic testing. Conclusion:The experimental findings indicate that the proposed MRs can validate PLMs effectively and support simulation metamorphic testing for PLMs. However, further investigations are required to enhance the semantic comprehension and common sense knowledge of PLMs and explore highly informative statistical patterns of PLMs, in order to improve their overall performance.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.