Active Learning-Based Data Collection in Crowd Replication

Lulu Gao,Shin’ichi Konomi

doi:10.1007/978-3-030-73113-7_5

Abstract

AbstractCrowd replication, which combines crowd sensing, direct observation, and mathematical modeling to enable efficient and accurate evaluation of crowd, is a low-effort, easy-to-adopt, and cost-effective mechanism for crowd data collection and analysis. In crowd replication, the quality of data collection is particularly important, which depends on the representativeness of the target population-based sampling. The main two target selection strategies, population-based sampling strategy, and cluster sampling strategy will be labor-intensive and time-consuming to obtain the stable, reliable, and valid data. Therefore, in this paper, a novel method of data collection in crowd replication based on active learning, which is a modern method in machine learning, aiming to reduce the sample size, complexity, and increase the accuracy of the data tasks as much as possible with less data is proposed. We apply active learning to allow us to obtain the dataset with high representativeness and informativeness. We demonstrate with experimental results that, compared with the traditional probability-based sampling strategies, the more representative samples and dataset can be stably captured by our contribution.KeywordsCrowd replicationData collectionActive learning

Full Text