System call sequences representing the runtime behavior of an application is particularly useful for anomaly detection in mobile applications. However, one of the main obstacles in this area is the lack of publicly available high-quality datasets. Because of the low computational power and storage constraints of mobile application platforms, a single mobile device cannot accomplish the task of massively installing applications and extracting interaction details with the operating system, making it extremely challenging to build large-scale fine-grained system call datasets. In this paper, we present the MaDroid dataset. It is the first comprehensive dataset and benchmark for anomaly detection in mobile applications using high-dimensional feature sequence data and maliciousness, and the first to incorporate virus total rating (VT) values into dataset features. It is constructed based on an automated collection framework that collects system call sequences from simulation environments at a fine-grained level for both normal and malicious mobile applications. The dataset is 457 GB in size and consists of 50,429 labeled system call sequences. The dataset covers mobile applications released at different times over the past 14 years, and the selected applications span across 10 major mainstream APP markets. We extract different feature subsets from the dataset and perform evaluations using RF, MLP, and GBDT to show the effectiveness and accuracy in detecting malicious mobile APPs. Our dataset can be a useful resource for security and machine learning community.