<span lang="EN-US">The pervasive availability of vast online information has fundamentally altered our approach to acquiring knowledge. Nevertheless, this wealth of data has also presented significant challenges to academic integrity, notably in the realm of cross-lingual plagiarism. This type of plagiarism involves the unauthorized copying, translation, ideas, or works from one language into others without proper citation. This research introduces a methodology for identifying multilingual plagiarism, utilizing a pre-trained multilingual bidirectional and auto-regressive transformers (mBART) model for document feature extraction. Additionally, a siamese long short-term memory (SLSTM) model is employed for classifying pairs of documents as either "plagiarized" or "non-plagiarized". Our approach exhibits notable performance across various languages, including English (En), Spanish (Es), German (De), and French (Fr). Notably, experiments focusing on the En-Fr language pair yielded exceptional results, with an accuracy of 98.83%, precision of 98.42%, recall of 99.32%, and F-score of 98.87%. For En-Es, the model achieved an accuracy of 97.94%, precision of 98.57%, recall of 97.47%, and an F-score of 98.01%. In the case of En-De, the model demonstrated an accuracy of 95.59%, precision of 95.21%, recall of 96.85%, and F-score of 96.02%. These outcomes underscore the effectiveness of combining the MBART transformer and SLSTM models for cross-lingual plagiarism detection.</span>
Read full abstract