Abstract
The use of datasets can be an important aid in research, for example, in the construction of an email parser, malware analysis, improve specific purpose algorithms or for the testing of tools. For datasets to be useful, they must possess three features: (1) quality to ensure that results are accurate and generalizable, (2) quantity to ensure that there are sufficient data to train and validate the tools and (3) availability for the research to be conducted and independently reproduced to ensure scientific validity. Further, funding agencies increasingly require that grantees make the results of their research available to the public. The current deficiencies in most databases that fail to meet these criteria present major weaknesses in assuring the reliability of research and testing and to the continued development in digital forensics. The majority of test databases are deficient as (1) many researchers produced their own datasets, (2) datasets are not released after the work has been completed and (3) there is a lack of labelled standardised datasets that can be used in research. These weaknesses lead to the disadvantages of low reproducibility, comparability and peer-validated research. Over half the datasets used to conduct published research were experiment generated, where researchers created specific scenarios to conduct their experiments due to the lack of available real-world datasets and datasets that have been created specifically to conduct experiments on new technology.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.