Abstract

Currently, a unified structured standard for describing radiological chest examination does not exist. The complexity of developing such text report templates lies in the diversity of instrumental methods, variety of diagnostic objectives and specific work characteristics of individual medical organizations. Development of tools for marking the unstructured radiological chest examination protocols makes it possible to improve the system of electronic document management in healthcare due to automation of data formalization processes as well as develop data sets for machine learning. The purpose of this study is to develop a system for automated marking of text reports of the unstructured radiological chest examination protocols using heuristic approach and machine learning algorithms. Material and methods. The study used patient data on radiological chest examinations of medical organizations connected to the Unified Radiological Information Service of the Unified Medical Information and Analysis System of inpatient and outpatient medical organizations of Moscow and the Moscow region. Semantic analysis methods, expert rules and machine learning algorithms were used for processing the unstructured text reports. Results. The study has identified language patterns associated with important pathological conditions and “norm” class as well as developed regular expressions for these classes. A dictionary of radiological concepts and abbreviations (397 items) was compiled, followed by the development of an algorithm for correcting grammar mistakes in the protocols. In collaboration with the expert group, the rules of multilabel classification of the radiological examination protocols were created and their efficiency was tested. When solving the multilabel classification problem using only the expert rules, the percentage of exact matches equaled to 84%. Inasmuch as classifiers for conditions such as “infiltration/consolidation” and “blackout focus” were not effective, we have adjusted the models of machine learning. Conclusion. The best classification results were demonstrated by the recurrent neural network with the long-short term memory architecture ensuring sensitivity of 89% and 99% for “infiltration/consolidation” and “blackout focus” classes, respectively. This made it possible to statistically significantly (p=0.039) increase the total percentage of the exact matches up to 87%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call