Abstract

Research is presented in this work to improve classification performance when using real-world training data for forecasting disease prediction likelihood. Optimisation techniques currently available are capable of providing highly efficient and accurate results however, performance potential can often be restricted when dealing with limited training resources. A novel approach is proposed with this work known as Synthetic Instance Model Optimisation (SIMO) which introduces Sequential Model-based Algorithm Configuration (SMAC) optimisation combined with Synthetic Minority Over-sampling Technique (SMOTE) for improving optimised prediction modelling. The SIMO approach generates additional synthetic instances from a limited training sample while simultaneously aiming to increase best algorithm performance. Results provided yield a partial solution for improving optimum algorithm performance when handling sparse training resources. Using the SIMO approach, noticeable performance accuracy and f-measure improvements were achieved over standalone SMAC optimisation. Results showed significant improvement when comparing collective training data with SIMO instance optimisation including individual performance accuracy increases of up to 46% and a mean overall increase for the entire 240 configurations of 13.96% over standard SMAC optimisation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call