Abstract

BackgroundData discovery, particularly the discovery of key variables and their inter-relationships, is key to secondary data analysis, and in-turn, the evolving field of data science. Interface designers have presumed that their users are domain experts, and so they have provided complex interfaces to support these “experts.” Such interfaces hark back to a time when searches needed to be accurate first time as there was a high computational cost associated with each search. Our work is part of a governmental research initiative between the medical and social research funding bodies to improve the use of social data in medical research.ObjectiveThe cross-disciplinary nature of data science can make no assumptions regarding the domain expertise of a particular scientist, whose interests may intersect multiple domains. Here we consider the common requirement for scientists to seek archived data for secondary analysis. This has more in common with search needs of the “Google generation” than with their single-domain, single-tool forebears. Our study compares a Google-like interface with traditional ways of searching for noncomplex health data in a data archive.MethodsTwo user interfaces are evaluated for the same set of tasks in extracting data from surveys stored in the UK Data Archive (UKDA). One interface, Web search, is “Google-like,” enabling users to browse, search for, and view metadata about study variables, whereas the other, traditional search, has standard multioption user interface.ResultsUsing a comprehensive set of tasks with 20 volunteers, we found that the Web search interface met data discovery needs and expectations better than the traditional search. A task × interface repeated measures analysis showed a main effect indicating that answers found through the Web search interface were more likely to be correct (F 1,19=37.3, P<.001), with a main effect of task (F 3,57=6.3, P<.001). Further, participants completed the task significantly faster using the Web search interface (F 1,19=18.0, P<.001). There was also a main effect of task (F 2,38=4.1, P=.025, Greenhouse-Geisser correction applied). Overall, participants were asked to rate learnability, ease of use, and satisfaction. Paired mean comparisons showed that the Web search interface received significantly higher ratings than the traditional search interface for learnability (P=.002, 95% CI [0.6-2.4]), ease of use (P<.001, 95% CI [1.2-3.2]), and satisfaction (P<.001, 95% CI [1.8-3.5]). The results show superior cross-domain usability of Web search, which is consistent with its general familiarity and with enabling queries to be refined as the search proceeds, which treats serendipity as part of the refinement.ConclusionsThe results provide clear evidence that data science should adopt single-field natural language search interfaces for variable search supporting in particular: query reformulation; data browsing; faceted search; surrogates; relevance feedback; summarization, analytics, and visual presentation.

Highlights

  • In each of these areas the strategy proposes specific actions to be achieved by a particular date

  • Research lies at the heart of innovation and knowledge creation

  • The vast increase in information arising from the digital revolution has the potential to improve further and accelerate research efforts, provided that the requisite data resources needed for scientific research can be collected, marshalled and preserved in ways that facilitate high quality research

Read more

Summary

EXECUTIVE SUMMARY

This document presents the UK Strategy for Data Resources for Social and Economic Research (the ‘National Data Strategy’). The National Data Strategy: Builds on success – by sustaining major new longitudinal resources and developing improved access to existing and forthcoming cross-sectional data sources Strengthens recent developments in data services – via measures to support the newly-founded Secure Data Service and Administrative Data Liaison Service and to ensure the continued success of the Economic and Social Data Service Explores and promotes research use of the new types of data arising from digitisation – transactions data and ‘tracking records’ Encourages the development of procedures, protocols and standards – supporting ethical safeguards surrounding data access and reuse whilst facilitating access for research purposes Helps to ‘internationalise’ the research agenda – establishing better procedures for data discovery for data held outside the UK and by encouraging use of UK data resources by the international research community Seeks to improve awareness of the research value of data – about new developments and the potential of existing data resources among research users Recognises the added complexity and co-ordination requirements resulting from devolution – thereby facilitating comparative research across the countries of the United Kingdom Plans for the future – by helping to develop a strategic approach to the long term funding, sustainability and preservation of major data resources In each of these areas the strategy proposes specific actions to be achieved by a particular date. To be achieved by: September 2010 June 2011 Dec 2011 Dec 2011 Oct 2012 Dec 2012 Dec 2011

INTRODUCTION
Background
SECTION 1 The challenges driving data needs
SECTION 2 Building on the foundations of the first National Data Strategy
SECTION 3 The digital revolution – proving research value and setting priorities
SECTION 4 A strategy for international data requirements
Developing data sharing protocols and data access agreements
SECTION 6 From plans to action: priorities for 2009 to 2012
Cross-sectional data on individuals and households
Data on and from organisations
Longitudinal data on people and households
Developing international data resources
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call