Abstract

The IEEE International Conference on Healthcare Informatics 2015 (ICHI 2015) announced a challenge in healthcare domain that concerns the quality of health inquiries on social media. The problem of the challenge is to reduce the repetition of posts for patient support forums. This problem gradually becomes hard to control due to the increase of forum users and lack of research within the forum's older posts. To address this problem we used a model that finds the similarity of forum posts using cosine similarity metric over the term frequency-inverse document frequency (TF-IDF). We applied our model on data that are provided by the challenge committee. We used three graduate students to annotate the data for us and find the agreement vote of similarity. The results of our model using cosine similarity and TF-IDF were improved over existing models that primarily use topic modeling approaches such as Latent dirichlet allocation (LDA), and Latent Semantic Index (LSI).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.