Abstract

ObjectiveTraining data fuel and shape the development of AI models. Intensive data requirements are a major bottleneck limiting the success of AI tools in sectors with inherently scarce data. In healthcare, training data is difficult to curate, triggering growing concerns that the current lack of access to healthcare by under-privileged social groups will translate into future bias in healthcare AIs. In this report, we developed an autoencoder to grow and enhance inherently scarce datasets to alleviate our dependence on big data. DesignComputational study with open-source data SubjectsThe data was obtained from six open-source datasets comprising patients aged 40 to 80 in Singapore, China, India, and Spain. MethodsThe reported framework generates synthetic images based on real-world patient imaging data. As a test case, we used autoencoder to expand publicly available training sets of optic disc photos, and evaluated the ability of the resultant datasets to train AI models in the detection of glaucomatous optic neuropathy.Main Outcome MeasuresAUC (Area Under the ROC Curve) were used to evaluate the performance of the glaucoma detector. A higher AUC indicates better detection performance. ResultsResults show that enhancing datasets with synthetic images generated by autoencoder led to superior training sets that improved the performance of AI models. ConclusionsOur findings here help address the increasingly untenable data volume and quality requirements for AI model development and have implications beyond healthcare, towards empowering AI adoption for all similarly data-challenged fields.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call