Abstract
Recent years have witnessed a revolution in data science and ‘big data’ into which psychiatric research has also being drawn. ‘Big data’ has been defined as data sets which are so large in size, so fast to change, and so complex in structure that traditional data processing techniques are overwhelmed [1]. The mining and exploitation of such big data resources as Electronic Healthcare Records (EHRs) present an exciting challenge to the field of psychiatric epidemiology. The number of big data projects within psychiatric research is growing, and Stewart and Davis’s literature review is, therefore, timely [2]. Technological advances in data processing and storage, computer networking, mobile technology, and data manipulation have rendered huge quantities of healthcarerelated data potentially amenable to analysis. Such studies potentially offer much larger patient numbers, wider parameters of study, and longer timescales of follow-up, than is typical of randomised controlled trials (RCTs) or cohort studies. A further advantage of this data is it often arises from naturalistic clinical settings, in terms of both clinical practice and patient health and comorbidity. Indeed, while often considered the ‘gold standard’ in medical research, RCTs do have important limitations, such as overly-strict exclusion criteria. Routine clinical data sets can, therefore, be complementary to RCT data, while also making research findings more relevant to everyday clinical practice. In addition, the quantity of clinical ‘big data’ potentially allows analysis of rarer clinical conditions, or subject areas that would be unlikely to meet ethical approval for more conventional studies (for example, medication usage in pregnancy). Big data studies provide the scale and breadth of patient numbers required for stratified, predictive, and personalised medicine research. In addition, notwithstanding many challenges of working with big data, large-scale analysis of routinely collected healthcare data sets has already demonstrated effectiveness in fields such as pharmacovigilance [3] and post-marketing clinical trials. As described in detail by Stewart and Davis, the challenging aspects of working with ‘big data’ are captured by the taxonomy of ‘Vs’—volume, velocity, and variety—first described by Laney [4] and extended since [1, 2]. There are additional issues to working with healthcare big data specifically, including that clinical administrative data are generally not collected, curated, or formatted in a manner optimised for research, and the inherent sensitivity of the data in terms of personal privacy. As the possibilities for healthcare big data expand into such areas as text mining and natural language processing of clinical records, near ‘real-time’ updating of repositories, and incorporation of new data streams from mobile and wearable technology, robust systems must be in place to channel the potential data deluge. How great a role ‘cloud’ computing and storage will have in this is an intriguing question. In healthcare big data, the benefits offered by the cloud in increased power and accessibility, and reduced cost, must be weighed against concerns over data ownership, encryption, and unauthorized access. An additional question is whether unique patient identifiers (UPIs) utilised by some national healthcare systems can be extended more This comment refers to the article available at doi:10.1007/s00127016-1266-8.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.