Abstract

SummaryEvery user uses a search engine to find health information from websites. Content‐rich health websites are considered in our research as wrong information in these websites can threaten life. Search engines give a list of URLs related to their search keyword. Generally, the user follows the top websites displayed by the search engine. Newly constructed websites do not have ratings, hit counts, and reviews. The search engine does not display newly constructed websites in their top rank. In such a case, the newly constructed website with the same content as the website displayed at the top of the search engine loses the user's trust. Another problem is; the phishing website URLs are also displayed by the Google Search engine, which appear similar to the genuine websites. To solve the problem and enhance the trust of health websites which is not at the top of the search engine among users, we have proposed an approach that extracts all URLs based on the keyword. It identifies all legitimate URLs using a Machine Learning classifier. Address bar features, Domain name features, HTML, and JavaScript features were identified for the dataset of getting legitimate URLs. Three classifiers (Decision Tree, Random Forest, and Support Vector Machine) were trained and evaluated. Decision Tree has the highest training accuracy, 94.125, testing accuracy, 92.75, and precision score of 96.97. The cross‐validation score of all three models is almost 93. Therefore, Decision tree is used to identify legitimate websites. After getting the list of legitimate URLs, all the content of the legitimate website is extracted. A Semantic Similarity between top‐rank legitimate website content and legitimate websites is found using Natural language processing techniques. Then the websites are ranked based on similarity and the value of the trust is assigned from highly trustable to less trustable. We have compared and correlated our results with the Web of Trust, a reputation tool for trust analysis, and have achieved a positive correlation. Thus, our approach removes phishing websites and enhances the trust in other websites that are not at the top of the search engine.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.