Abstract

Distinguishing identities is useful for several applications such as automated grocery or personalized recommendations. Unfortunately, several recent proposals for identification systems are evaluated using poor recruitment practices. We discovered that 23 out of 30 surveyed systems used datasets with 20 participants or less. Those studies achieved an average classification accuracy of 93%. We show that the classifier performance is misleading when the participant count is small. This is because the finite precision of measurements creates upper limits on the number of users that can be distinguished. To demonstrate why classifier performance is misleading, we used publicly available datasets. The data was collected from human subjects. We created five systems with at least 20 participants each. In three cases we achieved accuracies greater than 90% by merely applying readily available machine learning software packages, often with default parameters. For datasets where we had sufficient participants, we evaluated how the performance degrades as the number of participants increases. One of the systems built suffered a drop in accuracy that was over 35% as the participant count increased from 20 to 250. We argue that data from small participant count datasets do not adequately explore variations. Systems trained on such limited data are likely to incorrectly identify users when the user base increases beyond what was tested. We conclude by explaining generalizable reasons for this issue and provide insights on how to conduct more robust system analysis and design.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.