Abstract

Style crack refers to the position where the author’s identity changes in the article completed by multiple authors. This paper summarizes the current situation and theory of related fields at home and abroad, and proposes a multi-feature based document segmentation method for plagiarism detection. Seven text style features are used for style crack recognition. Through the result of feature extraction, the combination of multi-feature fusion and unsupervised machine learning algorithm is used to classify the features based on extraction, and the clustering algorithm is used to cluster the style features so as to find the location of style cracks. Experiments show that the method is effective and scientific, and achieves good results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.