Abstract

To investigate and implement semiautomated screening for meta-analyses (MA) in urology under consideration of class imbalance. Machine learning algorithms were trained on data from three MA with detailed information of the screening process. Different methods to account for class imbalance (Sampling (up- and downsampling, weighting and cost-sensitive learning), thresholding) were implemented in different machine learning (ML) algorithms (Random Forest, Logistic Regression with Elastic Net Regularization, Support Vector Machines). Models were optimized for sensitivity. Besides metrics such as specificity, receiver operating curves, total missed studies, and work saved over sampling were calculated. During training, models trained after downsampling achieved the best results consistently among all algorithms. Computing time ranged between 251 and 5834s. However, when evaluated on the final test data set, the weighting approach performed best. In addition, thresholding helped to improve results as compared to the standard of 0.5. However, due to heterogeneity of results no clear recommendation can be made for a universal sample size. Misses of relevant studies were 0 for the optimized models except for one review. It will be necessary to design a holistic methodology that implements the presented methods in a practical manner, but also takes into account other algorithms and the most sophisticated methods for text preprocessing. In addition, the different methods of a cost-sensitive learning approach can be the subject of further investigations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call