GEN BiotechnologyVol. 2, No. 3 CommentariesFree AccessBreaking the Chains of Health DataPhilip RussmeyerPhilip Russmeyer*Address correspondence to: Philip Russmeyer, FITFILE, 165–167 Great Portland St, London W1W 5PF, United Kingdom, E-mail Address: philip.russmeyer@fitfile.comFITFILE, London, United Kingdom.Search for more papers by this authorPublished Online:19 Jun 2023https://doi.org/10.1089/genbio.2023.0026AboutSectionsPDF/EPUB Permissions & CitationsPermissionsDownload CitationsTrack CitationsAdd to favorites Back To Publication ShareShare onFacebookTwitterLinked InRedditEmail To release the brakes on life sciences research, we must break health data out of silos, writes the CEO of FITFILE.The global biotechnology industry is responsible for many of the biggest changes to health care delivery, human longevity, and population-level health outcomes over the past decade. From CRISPR to immunotherapy and precision medicine, advances in biotechnology are shaping the future of our existence in an increasingly complex and challenging environment.But this complex future demands faster and more connected innovation on a much larger scale than is currently possible. As things stand, the life sciences sector is struggling to operate at the scale or at the pace that is required to meet the challenges.The biggest roadblock to escalating innovation across the life sciences is not funding, nor talent, and certainly not ambition.Rather, the chief roadblock is data. Not a lack of it—quite the opposite. We live in a world rich in health data—data from hospitals and providers, wearable devices, research institutions, government databases, private providers, and numerous other sources.Unfortunately, this wealth of information is not being effectively harnessed. No one is—yet—properly joining the dots and leveraging the patchworked insights at the right scale. That is not for want of trying: ambitious data unification projects include the United Kingdom's 100,000 Genomes Project, an initiative to sequence the genomes of 100,000 people with rare diseases and cancer. The project involved the unification of data from multiple sources, including clinical data, genomic data, and health care records, and outcomes included the discovery of 19 new disease–gene associations.The reason why life sciences are not progressing in parallel with the proliferation of relevant data points is because existing data infrastructures make it just too difficult, too risky, and too expensive to unite and analyze the real-world data on a continuous basis. Time-limited one-off projects such as 100,000 Genomes are a promising starting point, but the real benefit will come when ongoing record-level data unification is enabled across entire populations.Straddling SilosIf, for example, researchers in one organization are working on developing new cancer treatments, they benefit hugely from the broadest possible access to cancer patients' anonymized health records. They would also benefit from access to noncommercially sensitive results of similar research carried out by other organizations. But as things currently stand, securely accessing these data, at the appropriate scale and in a useful form, is a painfully slow process—if it is even possible at all. Health records are often fragmented, with varying levels of detail and completeness, and there is no standard format for collection and storage of health data—even within organizations such as the National Health Service (NHS).Almost without exception, valuable clinical and nonclinical health, research, demographic, and activity data are locked up behind prohibitive privacy barriers, which include the General Data Protection Regulation (GDPR) across Europe, the Health Insurance Portability and Accountability Act (HIPAA), and the California Consumer Privacy Act (CCPA) in the United States. For many use cases, specific consent or a legal basis such as legitimate patient care is required in Europe to view identifiable (including tokenized) data.According to research published by Capgemini,1 many biopharma companies see data protection law compliance and privacy as a major challenge when launching new products and services. Only one-quarter of >500 biopharma respondents to the Capgemini survey said they have a cloud computing platform in place to integrate data from different sources, such as electronic medical records from the clinician's office or sensor data from wearables. And fewer than a third have common frameworks and tools for data collection, analysis, and management of internal and external data.A lack of awareness of how proper investment in data infrastructure converts to return on investment is partially responsible for this, but concerns over interoperability, adherence to privacy regulations, and the need to obtain patient consent are also getting in the way of integration projects.When researchers can only access a fragment of the existing data, it is like trying to solve a puzzle when some of the pieces have fallen through gaps in the floorboards—frustratingly impossible.Inadequate outdated processes for accessing and sharing data within and between organizations are slamming the brakes on research. Patterns, correlations, and trends remain undetected. Research and investigations are needlessly duplicated. Lessons learnt are not lessons shared. Partial data result in estimation biases being embedded into models, and in expensive delays. Although treatments are stalling in the development phases and postmarket products lack supportive real-world evidence, patients are suffering and, due to inefficient clinical trial recruitment and associated delays, insufficiently demonstrated product value and safety, and imprecisely targeted commercial activities, an estimated $243.1 billion is being wasted each year in the global life sciences industry.As given in Table 1, if life science organizations could unite internal and external clinical, nonclinical, and operational data, they would benefit considerably. For research teams, study feasibility and site selection would become easier with more granular and dependable information, and appropriate clinical trial participants recruited more quickly and efficiently. Manufacturing teams could plan more accurately, and commercial teams could gain a more nuanced understanding of patient or clinician needs, experiences, and responses.Table 1. Examples of the benefits from United Health DataExample beneficiaryBenefits from united health dataExample use caseReal-world applicationBiopharma R&DLarge-scale record-level insights from real-world populations to inform selection and adaptive iteration of drugs, drug targets, study subjects, and endpointsEfficient and effective clinical study protocols and executionAlzheimer's Disease Neuroimaging Initiative (ADNI) and The Cancer Genome Atlas Program (TCGA)Broader life science R&DBarrier-free, safe, and secure access to real-world dataDiscovery of new disease–gene association100,000 Genomes ProjectHealth care providersReal-time monitoring of treatment effectivenessRapid response during public health crisisUnited Kingdom RECOVERY TrialRegulatory bodiesTimely shared insights between organizations and countriesAccelerates process of assessing a medicine's safety and effectivenessDARWIN EUAll of this would bring more effective new therapeutics to market at a faster pace, more efficiently targeted at best responders and at lower cost. With near real-time tracking of data across silos, it would also be easier to support value- and outcomes-based reimbursement models (for payors, providers, and biopharma), and to deliver more accurate safety profiling.At the start of the United Kingdom's first COVID-19 lockdown in March 2020, the Randomised Evaluation of COVID-19 Therapy (RECOVERY) trial was launched. In the first 100 days, this trial showed that both hydroxychloroquine and lopinavir–ritonavir were not effective treatments, however, dexamethasone was proven to reduce deaths by up to one-third in hospitalized patients.2 Dexamethasone later saved over a million lives during the pandemic, which would not have been possible if the researchers had not been able to access real-world real-time data (from >48,000 participants and 192 sites) for the purposes of recruitment, treatment allocation, success monitoring, and statistical analysis.Today, on a similarly ambitious scale, the European Medicines Agency (EMA) and the European Medicines Regulatory Network are spearheading a potentially transformative data project. The Data Analysis and Real World Interrogation Network (DARWIN EU®), which is intended to be fully operational by 2024, will provide timely and reliable evidence on the use, safety, and effectiveness of medicines for human use, including vaccines, from real-world health care databases across the European Union (EU). DARWIN EU will support regulatory decision making with high-quality validated real-world data.Facing Up to the Privacy ProblemBuilding infrastructures that enable convenient connected data access is one thing, but ensuring that patient privacy is fully and permanently preserved is arguably a larger problem altogether. It is critical to assemble complete health profiles from vital data elements stuck in numerous data silos, but those data elements have historically only been possible to bring together in identifiable or reversibly deidentified form. To protect privacy, even the previous state-of-the-art “pseudonymization (or tokenization) at source” for record-level unification is still considered risky identifiable data under the GDPR.As highlighted in a 2020 article by Barnes and colleagues,3 this has been holding back the secondary use of health data, in turn stalling innovation in the life sciences, slowing the pace of drug discovery and holding back precision medicine by preventing the required stratification into best responders for personalized treatments.To ensure privacy and reduce risk, the best tools now available should be used to ensure data are anonymized at source and left at source whenever possible, whereas access and linkage are fully controlled and tracked. Repeatedly demonstrating the security and the sophistication of privacy protections will help build the public trust and necessary scale that is crucial to the success of data unification projects, including the Alzheimer's Disease Neuroimaging Initiative (ADNI) and The Cancer Genome Atlas Program (TCGA). This will require transparency and flexibility as regulations evolve, and a move toward the widespread application of anonymization and computation of data at source.Releasing the brakes on progress and innovation in the life sciences is not going to happen overnight, and rightly so. Secure, safe, and effective data access and unification must be done properly—time must be taken to appoint reliable solution providers and deploy the best tools.Success and trust will be won from moving purposefully toward a more collaborative way of working with data: one that puts patient privacy at the center. This is a reality that is getting closer—and more achievable—every day, as the most talented people in the healthtech ecosystem turn their attention to finally solving the data puzzle.Philip Russmeyer is the CEO and founder of FITFILE.