Empowering the auditory perception-based person identification and localization could enable the service robots to navigate among the crowds in a more socially compliant manner. And it is more user-friendly with fewer privacy concerns compared to the widely studied camera-supported approaches. One way to achieve such auditory person identification and localization is to identify and locate the subject by hearing her/his footstep. As far as we know, although several related acoustic identification benchmark datasets have been made, the dataset that simultaneously concentrates on acoustic identification and localization is absent. Aiming at the shortage of such benchmark datasets, we propose to build the acoustic footstep-based person identification and localization dataset (AFPILD) by unifying the identify and locate tasks for the first time, concerning the clothing and shoe type covariates from the human-aware mobile service robot navigation perspective. Specifically, we record the walking sounds of 40 subjects with a lightweight microphone array in one real unstrained indoor room. For ground truth localization label annotation, we record the point clouds of the walking subjects with a multi-line LiDAR synchronously. The AFPILD contains nearly 10 h of footstep audios (around 62,174 footstep events). To the best of our knowledge, this is the largest and most comprehensive conducted acoustic footstep dataset built for person identification and localization study up to now. We also present the baseline works concerning the identification, localization, and synchronous identify-locate tasks, respectively, to compare various potential approaches on this dataset in the future. Experimental results show that the covariates of clothing and shoe type degrade the footstep identification (around 12% of clothing and 33% of shoe type), while their impacts on the localization are unobvious.