Abstract
Active learning is a common strategy to deal with large-scale data with limited labeling effort. In each iteration of active learning, a query is ready for oracle to answer such as what the label is for a given unlabeled data. Given the method, we can request the labels only for those data that are essential and save the labeling effort from oracle. We focus on pool-based active learning where a set of unlabeled data is selected for querying in each run of active learning. To apply pool-based active learning to massive high-dimensional data, especially when the unlabeled data set is much larger than the labeled set, we propose the APRAL and MLP strategies so that the computation for active learning can be dramatically reduced while keeping the model power more or less the same. In APRAL, we avoid unnecessary data re-ranking given an unlabeled data selection criteria. To further improve the efficiency, with MLP, we organize the unlabeled data in a multi-layer pool based on a dimensionality reduction technique and the most valuable data to know their label information are more likely to store in the top layers. Given the APRAL and MLP strategies, the active learning computation time is reduced by about 83% if compared to the traditional active learning ones; at the same time, the model effectiveness remains.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.