Machine learning in medicine: a practical introduction to natural language processing

Conrad J Harrison,Chris J Sidey-Gibbons

doi:10.1186/s12874-021-01347-1

Conrad J Harrison, Chris J Sidey-Gibbons

Open Access

PDF Available

https://doi.org/10.1186/s12874-021-01347-1

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundUnstructured text, including medical records, patient feedback, and social media comments, can be a rich source of data for clinical research. Natural language processing (NLP) describes a set of techniques used to convert passages of written text into interpretable datasets that can be analysed by statistical and machine learning (ML) models. The purpose of this paper is to provide a practical introduction to contemporary techniques for the analysis of text-data, using freely-available software.MethodsWe performed three NLP experiments using publicly-available data obtained from medicine review websites. First, we conducted lexicon-based sentiment analysis on open-text patient reviews of four drugs: Levothyroxine, Viagra, Oseltamivir and Apixaban. Next, we used unsupervised ML (latent Dirichlet allocation, LDA) to identify similar drugs in the dataset, based solely on their reviews. Finally, we developed three supervised ML algorithms to predict whether a drug review was associated with a positive or negative rating. These algorithms were: a regularised logistic regression, a support vector machine (SVM), and an artificial neural network (ANN). We compared the performance of these algorithms in terms of classification accuracy, area under the receiver operating characteristic curve (AUC), sensitivity and specificity.ResultsLevothyroxine and Viagra were reviewed with a higher proportion of positive sentiments than Oseltamivir and Apixaban. One of the three LDA clusters clearly represented drugs used to treat mental health problems. A common theme suggested by this cluster was drugs taking weeks or months to work. Another cluster clearly represented drugs used as contraceptives. Supervised machine learning algorithms predicted positive or negative drug ratings with classification accuracies ranging from 0.664, 95% CI [0.608, 0.716] for the regularised regression to 0.720, 95% CI [0.664,0.776] for the SVM.ConclusionsIn this paper, we present a conceptual overview of common techniques used to analyse large volumes of text, and provide reproducible code that can be readily applied to other research studies using open-source software.

Highlights

Unstructured text, including medical records, patient feedback, and social media comments, can be a rich source of data for clinical research
We aim to demonstrate to clinicians and qualitative researchers the type of text analyses that are performed with open source software, and provide practical understanding to academics that wish to apply these techniques in their own research
Levothyroxine and Viagra had a higher percentage of positive sentiments than Apixaban and Oseltamivir

Summary

Introduction

Unstructured text, including medical records, patient feedback, and social media comments, can be a rich source of data for clinical research. The purpose of this paper is to provide a practical introduction to contemporary techniques for the analysis of text-data, using freely-available software. The purpose of this article is to provide an introduction to the use of common machine learning techniques for analysing passages of written text. For example medical records, patient feedback, assessments of doctors’ performance and social media comments, can be a rich source of data to aid clinical decision making and quality improvement. Where text-based data exist on the internet (for example, social media reviews of healthcare providers), it is technically possible to capture these using a process called web-scraping, which is straightforward to perform using open-source software [12]. Before attempting web-scraping, it is important that researchers ensure they do not breach any privacy, copyright or intellectual property regulations, and have appropriate ethical approval to do so where necessary

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC medical research methodology	Publication Date: Jul 31, 2021
Citations: 86	License type: open-access

R Discovery Prime

Machine learning in medicine: a practical introduction to natural language processing

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC medical research methodology

Lead the way for us

Similar Papers

Benchmarking of Machine Learning for Anomaly Based Intrusion Detection Systems in the CICIDS2017 Dataset
Ziadoon Kamil Maseer ... Cik Feresa Mohd Foozy
IEEE Access | VOL. 9
Ziadoon Kamil Maseer, et. al.Ziadoon Kamil Maseer ... Cik Feresa Mohd Foozy
01 Jan 2020
IEEE Access | VOL. 9

Non-destructive quality monitoring of 3D printed tissue scaffolds via dielectric impedance spectroscopy and supervised machine learning
Shohanuzzaman Shohan ... Rohan Shirwaiker
Procedia Manufacturing | VOL. 53
Shohanuzzaman Shohan, et. al.Shohanuzzaman Shohan ... Rohan Shirwaiker
01 Jan 2020
Procedia Manufacturing | VOL. 53

A comparative evaluation of supervised machine learning algorithms for township level landslide susceptibility zonation in parts of Indian Himalayas
Bipin Peethambaran ... K.V Shihabudheen
CATENA | VOL. 195
Bipin Peethambaran, et. al.Bipin Peethambaran ... K.V Shihabudheen
15 Jun 2020
CATENA | VOL. 195

Benchmarking machine learning algorithms by inferring transportation modes from unlabeled GPS data
Hekmat Dabbas ... Bernhard Friedrich
Transportation Research Procedia | VOL. 62
Hekmat Dabbas, et. al.Hekmat Dabbas ... Bernhard Friedrich
01 Jan 2021
Transportation Research Procedia | VOL. 62

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Machine learning in medicine: a practical introduction to natural language processing

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC medical research methodology