Abstract

Spreadsheets are commonly used across most academic discplines, however their use has been associated with a number of issues that affect the accuracy and integrity of research data. In 2016, new training on spreadsheet curation was introduced at the University of Sydney to address a gap between practical software skills training and generalised research data management training. The approach to spreadsheet curation behind the training was defined and the training's distinction from other spreadsheet curation training offering described.\parThe uptake of and feedback on the training were evaluated. Training attendance was analysed by discipline and by role. Quantitative and qualitative feedback were analysed and discussed. Feedback revealed that many attendees had been expecting and desired practical spreadsheet software skills training. Issues relating to whether or not practical skills training should and can be integrated with curation training were discussed. While attendees were found to be predominantly from science disciplines, qualitative feedback suggests that humanities attendees have specific needs in relation to managing data with spreadsheets that are currently not being met. Feedback also suggested that some attendees would prefer the curation training to be delivered as a longer, more in depth, hands on workshop.\parThe impact of the training was measured using data collected from the University's Research Data Management Planning (RDMP) tool and the Sydney eScholarship Repository. RDMP descriptions of spreadsheet data and records of tabular datasets published in the repository were analysed and assessed for quality and for accompanying data documentation. No significant improvements in data documentation or quality were found, however it is likely too soon after the launch of the training program to have seen much in the way of impact.\parIdentified next steps include clarifying the marketing material promoting to the training to better communicate the curation focus, investigating the needs of humanities researchers working with qualitative data in spreadsheets, and incorporating new material into the training in order to address those needs. Integrating curation training with practical skills training and modifying the training to be more hands on are changes that may be considered in future, but will not be implemented at this stage.

Highlights

  • The ProblemThe use of spreadsheets for collecting, analysing, and storing research data is common, but not without potential problems. Ziemann, Eren, and El-Osta (2016) alerted the genomics research community to issues with data integrity in their field due to Microsoft Excel’s automatic format conversion; a high profile paper in economics was refuted in part on the basis of a spreadsheet error in data analysis (Herndon, Ash and Pollin, 2013); and Barchard and Pace (2011) document the impact of the type of data entry methods commonly used with spreadsheets on data accuracy and statistical results

  • Spreadsheet curation requires a qualitative judgement on what features of the data are significant and/or necessary in order to understand and use the data, and different curation strategies will apply on the basis of this judgement

  • Spreadsheet curation training is the first instalment in a series of best practice research data management training sessions, designed to complement existing and new training offerings

Read more

Summary

Introduction

The ProblemThe use of spreadsheets for collecting, analysing, and storing research data is common, but not without potential problems. Ziemann, Eren, and El-Osta (2016) alerted the genomics research community to issues with data integrity in their field due to Microsoft Excel’s automatic format conversion; a high profile paper in economics was refuted in part on the basis of a spreadsheet error in data analysis (Herndon, Ash and Pollin, 2013); and Barchard and Pace (2011) document the impact of the type of data entry methods commonly used with spreadsheets on data accuracy and statistical results. The impact of the training was measured using data collected from the University’s Research Data Management Planning (RDMP) tool and the Sydney eScholarship Repository.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call