Electronic phenotyping of heart failure from a national clinical information database

M Nakayama,R Inoue

doi:10.1093/ehjci/ehaa946.3486

Abstract

Abstract Introduction A database of clinical information collected from several medical institutions, including national university hospitals and private hospital groups, and the medical information database network, MID-NET, have been available to the public in Japan since 2018. To analyse clinical events, i.e., to perform electronic phenotyping, it is important to extract data from clinical information correctly, combine multiple pieces of information, and define the target disease. Herein, we investigated a study to find patients with heart failure and validated our findings using MID-NET data. Methods A criterion to describe heart failure cases was determined according to clinical guidelines released by the Japanese Circulation Society. The data studied were based on records from April 1–December 31, 2013. The initial rule was based on disease names, examinations, and medications pertaining to heart failure. We extracted and analysed clinical data from MID-NET and found patients with heart failure. Two doctors, including a cardiologist, reviewed the medical records and verified the legitimacy of the cases, following which we calculated precision and recall rates. Next, we examined a method to identify factors to extract true cases correctly using machine learning with XGBoost in R. Results A total of 5,282 cases extracted via disease names were related to heart failure. Of these, 2,799 cases corresponding to the initial rule were retrieved, and 200 cases were randomly sampled and assessed. A total of 70 cases were found to be true. Thus, a precision rate of 0.350 and a recall rate of 0.912 were determined. A machine learning method revealed the correlation of heart failure with several factors, including the serum b-type natriuretic peptide (BNP) value, link between commencement date of the disease and actual hospitalization date, and medications for the treatment of heart failure. Using this data, we could determine the conditions contributing to improving the validity of the cases with heart failure. In this manner, patient cases were extracted using the disease name as it is related to heart failure and hospitalisation within two weeks after the commencement date of the disease. Furthermore, the candidates were categorised into three groups according to serum BNP values (high, middle, and low ranges). The high group was labelled “heart failure”, and the low group was excluded. In the middle group, candidates were additionally categorised according to their prescribed medication for heart failure. Our analysis indicated that the precision rate increased to 0.878 while the recall rate decreased to 0.697. The F-measure also increased from 0.506 to 0.777. Conclusions To find target cases from a large clinical database, precise electronic phenotyping is required. A machine learning method can enable accurate identification of patients with heart failure. Leveraging large amounts of clinical data may be beneficial for medical research progress. Funding Acknowledgement Type of funding source: Public grant(s) – National budget only. Main funding source(s): Japan Agency for Medical Research and Development

Full Text