Abstract

Medical records contain many terms that are difficult to process. Our aim in this study is to allow visual exploration of the information in medical databases where texts present a large number of syntactic variations and abbreviations by using an interface that facilitates content identification, navigation, and information retrieval. We propose the use of multi-term tag clouds as content representation tools and as assistants for browsing and querying tasks. The tag cloud generation is achieved by using a novelty mathematical method that allows related terms to remain grouped together within the tags. To evaluate this proposal, we have carried out a survey over a spanish database with 24,481 records. For this purpose, 23 expert users in the medical field were tasked to test the interface and answer some questions in order to evaluate the generated tag clouds properties. In addition, we obtained a precision of 0.990, a recall of 0.870, and a F1-score of 0.904 in the evaluation of the tag cloud as an information retrieval tool. The main contribution of this approach is that we automatically generate a visual interface over the text capable of capturing the semantics of the information and facilitating access to medical records, obtaining a high degree of satisfaction in the evaluation survey.

Highlights

  • IntroductionMultiple data are collected every day. In order to be useful, data must be processed, which is a complex task [1]

  • In the medical field, multiple data are collected every day

  • We generated a tag cloud from the attribute “Proposed Intervention”. This attribute is especially complex since it contains a large number of syntactic variations, and the information in it was introduced by different practitioners in natural language

Read more

Summary

Introduction

Multiple data are collected every day. In order to be useful, data must be processed, which is a complex task [1]. As far as textual information is concerned, this is not an easy task since a text may present a large number of syntactic variations or even mistakes. This information is usually inserted by different people who use different writing patterns. Information extraction is the task of obtaining structured semantic relationships from unstructured text [16]. In [19] one of the simplest methods for identifying relationships between entities is applied. They used statistics of co-occurrences to calculate the degree of association between diseases and drugs in clinical records. Most of the approaches based on co-occurrences achieved high recall and low accuracy: recall is the ability of a model is its ability to find all the relevant cases within a dataset, and accuracy is the fraction between the number of relevant cases found divided by the total number of cases

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call