Recent studies have shown how synthetic data generation methods can be applied to electronic health records (EHRs) to obtain synthetic versions that do not violate privacy rules. This growing body of research has resulted in the emergence of numerous methods for evaluating the quality of generated data, with new publications often introducing novel evaluation methods. This work presents a detailed review of synthetic EHRs, focusing on the various evaluation methods used to assess the quality of the generated EHRs. We discuss the existing evaluation methods, offering insights into their use as well as providing an interpretation of the evaluation metrics from the perspectives of achieving fidelity, utility and privacy. Furthermore, we highlight the key factors influencing the selection of evaluation methods, such as the type of data (e.g., categorical, continuous, or discrete) and the mode of application (e.g., patient level, cohort level, and feature level). To assess the effectiveness of current evaluation measures, we conduct a series of experiments to shed light on the potential limitations of these measures. The findings from these experiments reveal notable shortcomings, including the need for meticulous application of methods to the data to reduce inconsistent evaluations, the qualitative nature of some assessments subject to individual judgment, the need for clinical validations, and the absence of techniques to evaluate temporal dependencies within the data. This highlights the need to place greater emphasis on evaluation measures, their application, and the development of comprehensive evaluation frameworks as it is crucial for advancing progress in this field.
Read full abstract