Abstract

To deal with the quantity and quality issues with online healthcare resources, creating web portals centred on particular health topics and/or communities of users is a strategy to provide access to a reduced corpus of information resources that meet quality and relevance criteria. In this paper we use hyperspace analogue to language (HAL) to model the language use patterns of webpages as Semantic Spaces. We have applied machine learning methods, including support vector machine (SVM), decision forest, and a novel summed similarity measure (SSM) to automatically classify online webpages on their Semantic Space models. We find classification accuracy on metadata attributes to be over 93% for ‘medical’ versus ‘supportive’ perspective, over 92% for disease stage of ‘early’ versus ‘advanced’, and over 90% for author credentials of ‘lay’ versus ‘clinician’ based on webpages of the Breast Cancer Knowledge Online portal. These results indicate that language use patterns can be used to automate such classification with useful levels of accuracy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call