Extraction of Meaningful Information from Unstructured Clinical Notes Using Web Scraping

Varshini K Sukanya,Uthra R Annie

doi:10.1142/s021812662350041x

Abstract

In the medical field, the clinical notes taken by the doctor, nurse, or medical practitioner are considered to be one of the most important medical documents. These documents hold information regarding the patient including the patient’s current condition, family history, disease, symptoms, medications, lab test reports, and other vital information. Despite these documents holding important information regarding the patients, they cannot be used as the data are unstructured. Organizing a huge amount of data without any mistakes is highly impossible for humans, so ignoring unstructured data is not advisable. Hence, to overcome this issue, the web scraping method is used to extract the clinical notes from the Medical Transcription (MT) samples which hold many transcripted clinical notes of various departments. In the proposed method, Natural Language Processing (NLP) is used to pre-process the data, and the variants of the Term Frequency-Inverse Document Frequency (TF-IDF)-based vector model are used for the feature selection, thus extracting the required data from the clinical notes. The performance measures including the accuracy, precision, recall and F1 score are used in the identification of disease, and the result obtained from the proposed system is compared with the best performing machine learning algorithms including the Logistic Regression, Multinomial Naive Bayes, Random Forest classifier and Linear SVC. The result obtained proves that the Random Forest Classifier obtained a higher accuracy of 90% when compared to the other algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Extraction of Meaningful Information from Unstructured Clinical Notes Using Web Scraping

Abstract

Talk to us

Similar Papers

More From: Journal of Circuits, Systems and Computers

Lead the way for us

Similar Papers

1287-P: Identifying When Incident Diabetes Was Diagnosed in Children and Young Adults, Using Natural Language Processing of Clinical Notes
Anthony Wong ... Marc Rosenman
Diabetes | VOL. 72
Anthony Wong, et. al.Anthony Wong ... Marc Rosenman
20 Jun 2023
Diabetes | VOL. 72

Identifying Diabetes in Clinical Notes in Hebrew: A Novel Text Classification Approach Based on Word Embedding.
Maxim Topaz ... Nadav Furie
Studies in health technology and informatics | VOL. 264
Maxim Topaz, et. al.Maxim Topaz ... Nadav Furie
21 Aug 2019
Studies in health technology and informatics | VOL. 264

THU0612 Development and Validation of An Accurate and Precise Natural Language Processing System To Capture Rheumatoid Arthritis Disease Activity Measures
G.W Cannon ... B.C Sauer
Annals of the Rheumatic Diseases | VOL. 75
G.W Cannon, et. al.G.W Cannon ... B.C Sauer
01 Jun 2016
Annals of the Rheumatic Diseases | VOL. 75

Comparison of Diagnosis Codes to Clinical Notes in Classifying Patients with Diabetic Retinopathy
Sean Yonamine ... Catherine Q Sun
Ophthalmology Science | VOL. 4
Sean Yonamine, et. al.Sean Yonamine ... Catherine Q Sun
01 Jun 2024
Ophthalmology Science | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Extraction of Meaningful Information from Unstructured Clinical Notes Using Web Scraping

Abstract

Talk to us

Similar Papers

More From: Journal of Circuits, Systems and Computers