Abstract

Primary data collected during a research study is often shared and may be reused for new studies. To assess the extent of data sharing in favourable circumstances and whether data sharing checks can be automated, this article investigates summary statistics from primary human genome-wide association studies (GWAS). This type of data is highly suitable for sharing because it is a standard research output, is straightforward to use in future studies (e.g., for secondary analysis), and may be already stored in a standard format for internal sharing within multi-site research projects. Manual checks of 1799 articles from 2010 and 2017 matching a simple PubMed query for molecular epidemiology GWAS were used to identify 314 primary human GWAS papers. Of these, only 13% reported the location of a complete set of GWAS summary data, increasing from 3% in 2010 to 23% in 2017. Whilst information about whether data was shared was typically located clearly within a data availability statement, the exact nature of the shared data was usually unspecified. Thus, data sharing is the exception even in suitable research fields with relatively strong data sharing norms. Moreover, the lack of clear data descriptions within data sharing statements greatly complicates the task of automatically characterising shared data sets.

Highlights

  • Research data sharing is increasingly encouraged by funders and journals on the basis that data reuse can improve research efficiency and transparency [1,2]

  • Out of all 314 articles classified as primary human genome-wide association studies (GWAS), 13% reported sharing GWAS summary statistics in some form, increasing substantially from 3% in 2010 to 23% in 2017 (Table 1)

  • Data sharing statements often did not specify the type of data, so those that were offered by email or by request may not include complete GWAS summary statistics

Read more

Summary

Introduction

Research data sharing is increasingly encouraged by funders and journals on the basis that data reuse can improve research efficiency and transparency [1,2]. Field cultures and data infrastructure all help to encourage data sharing [6] and researchers seem increasingly willing to publish their data [7]. This may generate citations to the data, originating paper or authors to recognise this effort [8,9,10,11,12,13], which is a useful incentive [14].

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.