Electronic medical records (EMRs) store data related to patients information enrolled during their stay in health structures. Data stored into EMRs span from data crawled from biological laboratories to textual description of diseases and diagnostic device results (e.g., biomedical images). Each EMR is related to a diagnosis related group (DRG) record. A DRG record is a record associated with a citizen that has been cured in a hospital. It contains a code, called major diagnostic category (MDC), which summarizes the treated disease and allows to reimburse costs related to patient treatments during his staying in health structures. DRGs are used for administrative process (e.g., costs and reimbursement management) as well as disease monitoring. Associating diagnostic codes with external information (such as environmental and geographical data) and with information filtered from EMRs (e.g., biological results or analytes values) can be useful to monitor citizens wellness status. We propose a methodology to analyze such data based on a multistep process. First, we cross reference data by using a semantics-based clustering procedure, extract information from EMRs, and then, cluster them by looking for similar patterns of diseases. Then, biological records in each disease cluster are analyzed to evaluate intracluster similarity by selecting analytes typologies and values. Finally, biological data is related to diagnosis codes and geometrically projected in areas of interest in order to map calculated outlier patients. We applied the methodology on two case studies: 1) diagnosis codes and biochemical analytes of 20000 biological analyses about hospitalized patients during one observation year and 2) the correlation between cardiovascular diseases and water quality in a southern Italian region. Preliminary findings show the effectiveness of our method.
Read full abstract