Abstract

The accuracy rate of information extracting by Web mining is not high because of the diversity and complexity of Web page. In order to increase the accuracy rate of information extracting by Web mining for building the science and technology basic information system, a novel multi-factor matching is proposed in this paper. The proposed method integrates the position of every word among the keywords corpus in normalized text and the multi-factor matching method between keywords corpus and normalized text which extracted from Web page by URL. The extracted results include the name, sex, birth, hometown and professional title of science and technology experts respectively. Experiments show that the accuracy rates obtain 95.64 percent and the recall rates achieve 99.69 percent respectively. The results show as by proposed method can satisfied the application requirements.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.