Abstract

Web page classification may be considered as a one of the most challenging research areas. Where the web has a huge volume of unstructured documents of distributed data related to a variety of domains; so, considering one base for the classification task will be extremely difficult. In addition, the web is full of noise that will certainly harm the classifier performance especially if it is found in the classifier training data. Generally, it will be more valued to build a domain-oriented classifiers (vertical classifiers) to classify pages related to a specific domain. This paper analyzes a new way of applying Bayes theorem to build a Domain-Oriented Naive Bayes (DONB) classifier. In addition, a main contribution is to introduce a novel classification strategy by adding the continuous learning ability to bayes theorem to build a Continuous Learning Naive Bayes (CLNB) classifier. Where the overfitting problem has a great impact on most web page classification techniques, continuous learning can be considered as a proposed solution, it allows the classifier to adapt itself continuously for achieving better performance. Both classifiers are tested; experimental results have shown that CLNB demonstrate significant performance improvement over DONB , where its accuracy reaches 94.1% after testing 1000 page. In addition, according to continuous learning, more accuracy enhancement is predicted during future tests.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.