Supervised and unsupervised learning models for pharmaceutical drug rating and classification using consumer generated reviews

Corban Allenbrand

doi:10.1016/j.health.2023.100288

Abstract

Optimization of medication therapy depends on maximizing benefits and minimizing side effects of medications. This research showed how a joint approach using text mining, natural language processing, and machine learning can provide information for personalized and optimized medication therapy. Reviews on the benefits and side effects of prescription and over-the-counter medications were used to determine how well an integrated supervised and unsupervised learning could learn medication satisfaction. Supervised learning with naïve-Bayes, non-linear support vector machine with radial basis function kernels, and random forests with CART decision trees was measured by a micro-aggregated Matthews correlation coefficient and a macro-averaged F1 measure. Random forests outperformed support vector machines by almost 250% and naive-Bayes by 600% on the two evaluation metrics. All models did better with three rating levels, instead of five. Topic modelling and stacked cluster analysis were coupled with parts-of-speech tagging and text mining operations to establish a robust data preprocessing procedure to eliminate noisy features from the data. Unsupervised topic modelling and clustering represented an exploratory validation of how easy supervised classification would be. Well-defined latent topics were discovered including topics on “sleep quality”, “the opportunity to get back to work”, and “weight gain”. Overlapping clusters revealed that incorporating more information on social, demographic, or medical history variables could improve classifier performance. This research provided evidence that medication satisfaction can be learned with carefully designed joint supervised, unsupervised, and natural language learning techniques.

Full Text