A novel web page text information extraction method

Chongjun Wang,Peng Wei

doi:10.1109/itnec.2019.8729329

A novel web page text information extraction method

Chongjun Wang, Peng Wei

https://doi.org/10.1109/itnec.2019.8729329

Copy DOI

Publication Date: Mar 1, 2019

Citations: 6

Affiliation: Space Engineering University, And Technology Research (United Kingdom), Switch

#Web Page #Text Information + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

The text information of today’s mainstream web pages generally has multiple features, and can be divided into a single text page and a multi-text page. In order to extract the text information of the web page, the position of the text information can be accurately located by using the multiple features of the text and the rules of the web page design. According to the above characteristics, this paper proposed a method for extracting web page text information based on multi-feature fusion. Experiments based on a large amount of data showed that the method has universality and high accuracy for the text information extraction of single text and multi-text web pages, and is very suitable for web pages with various styles.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.