Document Segmentation Method Based on Style Feature Fusion

Gang Liu,Xu Cheng,Kai Wang,Tao Li,Wangyang Liu

doi:10.1088/1757-899x/646/1/012044

Document Segmentation Method Based on Style Feature Fusion

Gang Liu, Xu Cheng + Show 3 more

Open Access

https://doi.org/10.1088/1757-899x/646/1/012044

Copy DOI

Journal: IOP Conference Series: Materials Science and Engineering	Publication Date: Oct 1, 2019
License type: cc-by

Affiliation: Harbin Engineering University

#Unsupervised Machine Learning Algorithm #Multiple Authors + Show 8 more

Abstract
Full-Text
Similar Papers

Abstract

Style crack refers to the position where the author’s identity changes in the article completed by multiple authors. This paper summarizes the current situation and theory of related fields at home and abroad, and proposes a multi-feature based document segmentation method for plagiarism detection. Seven text style features are used for style crack recognition. Through the result of feature extraction, the combination of multi-feature fusion and unsupervised machine learning algorithm is used to classify the features based on extraction, and the clustering algorithm is used to cluster the style features so as to find the location of style cracks. Experiments show that the method is effective and scientific, and achieves good results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

More From: IOP Conference Series: Materials Science and Engineering

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.