Abstract

Active learning, a subfield of machine learning, can train a good model by selecting a minimum number of labeled samples. In many machine learning scenarios, needed information (such as the best value in unlabeled datasets) is acquired by prediction. When there is too little data in the training model, the prediction accuracy would obviously affect the accuracy of the results. To establish a high-performance regression model for a small dataset while accelerating the search for the best sample, a new active learning query strategy, EGO-ALR, that combines efficient global optimization (EGO) and active learning for regression (ALR) was proposed. It was found that the performance of EGO-ALR was significantly better than the original ALR query strategies in terms of the root mean square error (RMSE), correlation coefficient (CC), and opportunity cost (Oppo Cost). Specifically, EGO-ALR increased the Oppo Cost by an average of 25.27% when the RMSE and CC values were not more than 1.07% different from the original ALR. This study validated the efficiency and robustness of EGO-ALR approaches using 19 datasets from various domains and three distinct linear regression models (Ridge regression, Lasso, and Elastic network).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call