Abstract

BackgroundThe selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark datasets have emerged from different sources, but their organization and adoption as standards have been inconsistent. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists.ResultsThe present study introduces an accessible, curated, and developing public benchmark resource to facilitate identification of the strengths and weaknesses of different machine learning methodologies. We compare meta-features among the current set of benchmark datasets in this resource to characterize the diversity of available data. Finally, we apply a number of established machine learning methods to the entire benchmark suite and analyze how datasets and algorithms cluster in terms of performance. From this study, we find that existing benchmarks lack the diversity to properly benchmark machine learning algorithms, and there are several gaps in benchmarking problems that still need to be considered.ConclusionsThis work represents another important step towards understanding the limitations of popular benchmarking suites and developing a resource that connects existing benchmarking standards to more diverse and efficient standards in the future.

Highlights

  • The term benchmarking is used in machine learning (ML) to refer to the evaluation and comparison of ML methods regarding their ability to learn patterns in ‘benchmark’ datasets that have been applied as ‘standards’

  • Evaluating machine learning methods To provide a basis for comparison, we evaluated 13 supervised ML classification methods from scikit-learn [22] on the 165 datasets in Penn Machine Learning Benchmark (PMLB)

  • We evaluated the ML methods using balanced accuracy [25, 26] as the scoring metric, which is a normalized version of accuracy that accounts for class imbalance by calculating accuracy on a per-class basis averaging the per-class accuracies

Read more

Summary

Introduction

The term benchmarking is used in machine learning (ML) to refer to the evaluation and comparison of ML methods regarding their ability to learn patterns in ‘benchmark’ datasets that have been applied as ‘standards’. Comparisons could be made over a range of evaluation metrics, e.g., power to detect signal, prediction accuracy, computational complexity, and model interpretability. This approach to benchmarking would be important for demonstrating new methodological abilities or to guide the selection of an appropriate ML method for a given problem. The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. As such, selecting and curating specific benchmarks remains an unnecessary burden on machine learning practitioners and data scientists

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call