What is a dataset?

Answer from top 10 papers

A dataset is a collection of data, often presented in a structured form such as a table, where each column represents a particular variable, and each row corresponds to a given record of the dataset (Gorsse et al., 2023). Datasets can vary in size, complexity, and format, serving as the foundation for analysis in various fields, including big data research, manufacturing process monitoring, macromolecular crystallography, autonomous navigation, and even eSports analytics (Koval et al., 2022; Ryman, 2015; Shraga & Miller, 2023; Somov et al., 2020; Tnani et al., 2022).
Interestingly, datasets are not static entities; they can evolve over time, especially in collaborative environments where multiple versions may exist, necessitating frameworks like Explain-Da-V to explain changes between versions (Liu et al., 2011; Ma et al., 2023). Additionally, datasets can be designed to capture a wide range of information, from acceleration data in industrial settings to physiological data in eSports, highlighting their versatility (Koval et al., 2022; Tnani et al., 2022).
In summary, datasets are essential tools for storing and organizing information that enable researchers and practitioners to perform analyses, build models, and derive insights across various domains. They are dynamic in nature, subject to updates and transformations, and can be tailored to specific research needs or applications, as evidenced by the diverse examples provided (Gorsse et al., 2023; Koval et al., 2022; Liu et al., 2011; Ma et al., 2023; Picard et al., 2021; Ryman, 2015; Shraga & Miller, 2023; Shraga & Miller, 2023; Somov et al., 2020; Tnani et al., 2022).

Source Papers

Collection and Validation of Psychophysiological Data from Professional and Amateur Players: a Multimodal eSports Dataset

Proper training and analytics in eSports require accurately collected and annotated data. Most eSports research focuses exclusively on in-game data analysis, and there is a lack of prior work involving eSports athletes' psychophysiological data. In this paper, we present a dataset collected from professional and amateur teams in 22 matches in League of Legends video game with more than 40 hours of recordings. Recorded data include the players' physiological activity, e.g. movements, pulse, saccades, obtained from various sensors, self-reported aftermatch survey, and in-game data. An important feature of the dataset is simultaneous data collection from five players, which facilitates the analysis of sensor data on a team level. Upon the collection of dataset we carried out its validation. In particular, we demonstrate that stress and concentration levels for professional players are less correlated, meaning more independent playstyle. Also, we show that the absence of team communication does not affect the professional players as much as amateur ones. To investigate other possible use cases of the dataset, we have trained classical machine learning algorithms for skill prediction and player re-identification using 3-minute sessions of sensor data. Best models achieved 0.856 and 0.521 (0.10 for a chance level) accuracy scores on a validation set for skill prediction and player re-id problems, respectively. The dataset is available at https://github.com/smerdov/eSports Sensors Dataset.

Read full abstract