Abstract

The text information of today’s mainstream web pages generally has multiple features, and can be divided into a single text page and a multi-text page. In order to extract the text information of the web page, the position of the text information can be accurately located by using the multiple features of the text and the rules of the web page design. According to the above characteristics, this paper proposed a method for extracting web page text information based on multi-feature fusion. Experiments based on a large amount of data showed that the method has universality and high accuracy for the text information extraction of single text and multi-text web pages, and is very suitable for web pages with various styles.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.