The sharing of research data for new science forms an important part of the UK Government's transparency agenda and its open data movement. The Medical Research Council (MRC) every year invests around £700 million of public money in health research, the primary output of which is data. This investment can be maximised through the sharing and further use of such data for the benefit of public health research. The MRC's research data policy aims to maximise the lifetime value of research data assets for human health in a timely and responsible manner, in line with the principles and guidelines from the Organisation for Economic Co-operation and Development (OECD) for access to research data from public funding.Analysis by Piwowar has shown that in microarray clinical trials, publicly available research data are significantly associated with a 69% increase in citations. At the same time, data sharing between researchers varies widely within the health sciences, with investigators of cancer and patient studies least likely to make their datasets available. Thus data are least available in areas in which they could make the biggest effect.Research to understand human health and to assess interventions to improve health depends on information about the health, lifestyle, genetics, and environments (social, economic, and physical) of populations and patients. Data sharing is imminent because researchers need access to large samples across populations, to be able to study the complex effects of genetic, environmental, and lifestyle factors on diseases. Furthermore, data and studied ideas need to be standardised and harmonised, to enable comparability of measurements and attributes across samples.Within this framework of research needs, the MRC Data Support Service project developed a Research Data Gateway for the discovery of MRC-funded population and patient studies and their datasets and variables. The Gateway enables researchers to find and explore variables across longitudinal cohort studies, to support data linkage for new research. A federated approach is used, whereby investigators of studies are responsible for storage, preservation, curation, and dissemination of data; and then for publication of standardised metadata into the gateway. The system uses a Drupal content management system and Apache Solr search and browse functionality, with metadata organised into modular units representing studies, time periods, collection events, and variables. Researchers can search and discover variables across studies and export baskets of variables to request access to data. The directory already holds more than 45 000 variables for four studies: Avon Longitudinal Study of Parents and Children (ALSPAC), National Survey for Health and Development (NSHD), Southampton Women's Study (SWS), and Whitehall II. Inclusion of more variables and studies is under development.Development towards a Data Documentation Initiative–Lifecycle (DDI3-L) metadata exchange standard is in progress, enabling metadata from diverse formats and structures to be ingested into the gateway and result in comparable, standardised metadata.The project also works towards integration of this discovery platform with the Cohorts and Longitudinal Studies Enhancement Resource (CLOSER), a joint ESRC/MRC-funded project facilitating cross-disciplinary research across cohort studies, so that the MRC metadata directory can underpin CLOSER's future data harmonisation work. FundingMedical Research Council.
Read full abstract