Abstract
This study aimed to investigate the effectiveness of data augmentation to improve dementia risk prediction using machine learning models. Recent studies have shown that basic blood tests are cost-effective in predicting cognitive function. However, developing models that address various conditions poses challenges due to constraints associated with blood test results and cognitive assessments, including high costs, limited sample sizes, and missing data from tests not performed in certain facilities. Despite being often limited by small sample sizes, periodontal examination data have also emerged as a cost-effective screening tool. To address these challenges, this study explored the effectiveness of data augmentation using the Synthetic Minority Over-sampling Technique for Regression with Gaussian noise (SMOGN), a Generative Adversarial Network (GAN), and a Conditional Tabular GAN (CTGAN) on periodontal examination and blood test data. The datasets included parameters such as cognitive assessment results from the Mini-Mental State Examination (MMSE), demographic characteristics, periodontal examination data, and blood test results. Linear regression models, random forests, and deep neural networks were used to evaluate the effectiveness of the synthesized data. This study used measured data from 108 participants and the synthesized data generated from the measured data. External validity was evaluated using a different dataset of 41 participants with missing items. The results suggested that normal GANs have the advantage of investigating models in data diversity, whereas CTGANs preserve the data structure and linear relationships in tabular data from the measured data, which drastically improves linear regression models. Importantly, by interpolating sparse areas in the distribution, such as age, the synthesized models maintained prediction accuracy for test data with extreme inputs. These findings suggest that GAN-synthesized data can effectively address regression problems and improve dementia risk prediction.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.