Abstract

The amount of medical text data is increasing dramatically. Medical text data record the progress of medicine and imply a large amount of medical knowledge. As a natural language, they are characterized by semistructured, high-dimensional, high data volume semantics and cannot participate in arithmetic operations. Therefore, how to extract useful knowledge or information from the total available data is very important task. Using various techniques of data mining can extract valuable knowledge or information from data. In the current study, we reviewed different approaches to apply for medical text data mining. The advantages and shortcomings for each technique compared to different processes of medical text data were analyzed. We also explored the applications of algorithms for providing insights to the users and enabling them to use the resources for the specific challenges in medical text data. Further, the main challenges in medical text data mining were discussed. Findings of this paper are benefit for helping the researchers to choose the reasonable techniques for mining medical text data and presenting the main challenges to them in medical text data mining.

Highlights

  • Journal of Healthcare Engineering [2]

  • Medical text data are an important part of medical big data which are described in natural language, cannot participate in an arithmetic operation, and are characterized by semistructured, high-dimensional, high data volume semantics [3]. ey cannot be well applied in research owing to no fixed writing format and being highly professional [4]

  • Data mining was defined in the “First section of the 1995 International Conference on Knowledge Discovery and Data Mining,” which has been widely used in disease auxiliary diagnosis, drug development, hospital information system, and genetic medicine to facilitate the medical knowledge discovery [9,10,11,12]

Read more

Summary

Medical Text Data Mining

Data mining was defined in the “First section of the 1995 International Conference on Knowledge Discovery and Data Mining,” which has been widely used in disease auxiliary diagnosis, drug development, hospital information system, and genetic medicine to facilitate the medical knowledge discovery [9,10,11,12]. Is study summarized the algorithms and tools for medical text data based on the four steps of data mining. Topaz et al [27] used an NLP-based classification system, support vector machine (SVM), recurrent neural network (RNN), and other machine learning methods to identify diabetic patients from clinical records and reduce the manual workload in medical text data mining. Artificial Neural Network (ANN) is a nonlinear prediction model that is learned by training, which has the advantages of accurate classification, self-learning, associative memory, and high speed searching for the optimal solution and good stability in data mining. ANN is different from traditional artificial intelligence and information processing technology, which overcomes the drawbacks of traditional artificial intelligence based on logical symbols in processing intuitive and unstructured information, and has the characteristics of self-adaption, self-organizing, and real-time learning It can complete data classification, feature mining, and other mining tasks.

Avoid repeated
Canadian patients with one or more chronic conditions
Findings
Discussion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call