An Approach to Extract and Compare Metadata of Human Activity Recognition (HAR) Data Sets

Gulzar Alam,Joseph Rafferty,Peter Nicholl,Ian Mcchesney

doi:10.1007/978-3-031-21333-5_71

Abstract

Currently, open data and data sets are emerging in human activity recognition (HAR) due to their importance in different application areas such as improving people's lives, enabling informed care decisions, real-world problem solutions, and strategies for choosing the best HAR approaches. There are challenges associated with curating and sharing open data and data sets due to the absence of metadata and complete descriptions of the shared data. By properly curating data sets it will be easier to recognise, obtain and reuse to help make progress in HAR research. In this paper, we propose a conceptual framework for understanding the open data set lifecycle as consisting of four phases of construction, sharing, finding, and using. Similarly, open issues and challenges are explored related to HAR data sets from the published literature. On this basis, an approach is presented to automatically extract metadata through web scraping of the HAR data sets and then perform a natural language processing (NLP) pipeline to detect the metadata of data sets. As a result of metadata retrieval, we show how comparisons can be performed under different scenarios which can help evaluate data set quality and identify areas for improvement in data set curation. This research work will assist the HAR research community in better understanding the open data set lifecycle and how data set quality can be improved.

Full Text