Health and Social Science Data—The Need to Reform Canada's Research Infrastructure

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

La collecte, la conservation et l'analyse des données sont fondamentales pour la recherche de pointe en sciences sociales et en santé. Malgré le rôle joué par Statistique Canada et l'Institut canadien d'information sur la santé (ICIS), les données importantes sont beaucoup trop cloisonnées au sein des provinces. Il est impossible d'entreprendre des analyses critiques, car les structures de financement de la recherche au Canada sont insuffisantes pour fournir un soutien à une échelle acceptable. Le Comité consultatif sur le système fédéral de soutien à la recherche a abordé certaines de ces questions en 2022, qui ont ensuite fait partie du budget fédéral de 2024 et du programme électoral du Parti libéral du Canada pour 2025. Cependant, il n'est pas certain que les initiatives de réforme à venir seront suffisantes. Le présent document présente un certain nombre de cas d'utilisation qui montrent l'énorme potentiel dans des domaines qui ont toujours été confrontés à de sérieux défis, en partie en raison des limites des structures actuelles de financement de la recherche. Enfin, il soutient que l'infrastructure de données en sciences de la santé et en sciences sociales doit être reconnue comme une « grande science » au même titre que les accélérateurs de particules et les grands télescopes.

Similar Papers
  • Book Chapter
  • 10.1007/978-3-032-07005-0_1
Open Science and the Role of Social Sciences Research Infrastructures and Data
  • Dec 11, 2025
  • Francesca Di Donato

The first chapter delves into Open Science, exploring the origins of foundational approaches, concepts, and principles that advocate for inclusion, quality, transparency and collaboration in research. The chapter emphasises key principles such as collaboration and sharing, which are crucial for advancing Open Science. In the OS landscape, Research Infrastructures are a crucial axis. The role of research infrastructures is here highlighted, considering their enabling and facilitating function for the adoption of Open Science practices. RIs are fundamental for defining the European Open Science Cloud, the large overarching framework within which the European Commission has been building the European Research Area for the past 10 years. In particular, Social Sciences data and open panel data can play a crucial role in conducting the multidisciplinary, multilingual and inclusive research needed to respond to today’s challenges.

  • Research Article
  • Cite Count Icon 3
  • 10.29173/iq972
Sharing qualitative research data, improving data literacy and establishing national data services
  • Jan 2, 2020
  • IASSIST Quarterly
  • Karsten Boye Rasmussen

Sharing qualitative research data, improving data literacy and establishing national data services

  • Research Article
  • Cite Count Icon 22
  • 10.1177/16094069231185452
Performing Qualitative Content Analysis of Video Data in Social Sciences and Medicine: The Visual-Verbal Video Analysis Method
  • Aug 23, 2023
  • International Journal of Qualitative Methods
  • Sahar Fazeli + 2 more

Videos are ubiquitous and have significantly impacted our communication and information consumption. The video, as data, has helped researchers understand how human interactions and relationships develop and change, and how patterns emerge in various circumstances and interpretations. Given the expanding relevance of video data in social science and medical research and the constant introduction of new formats and sources, it is critical to be able to conduct a thorough analysis of this multimodal data. However, the few methodologies (e.g., Actor Network Theory, Picture Theory) appropriate to video data analysis lack detailed guidelines on how to select, organize, and examine the multimodality of video data. This article aims to overcome this practice or methodological gap by proposing and demonstrating the Visual-Verbal Video Analysis (VVVA) method, a six-step framework adapted from Multimodal Theory and Visual Grounded Theory for organizing and evaluating video material according to the following dimensions: general characteristics of the video; multimodal characteristics; visual characteristics; characteristics of primary and secondary characters; and content and compositional characteristics including the transmission of messages, emotions, and discourses. This article also looks at the theories underlying video data analysis, focusing on Grounded Theory and Multimodality Theory, and provides multiple examples of coding and interpretive processes to deepen understanding and comprehension. The VVVA data extraction matrices provide a systematic coding approach for verbal, visual, and textual content, allowing for structured, coherent extraction that supports the discovery of patterns and links among disparate types of information. The VVVA method may be applied to a wide range of video data in social and medical sciences that vary in length and originate from different sources (e.g., open access web sources, pre-recorded organizational videos and recordings created for research purposes). The VVVA method effectively tracks the ongoing research process, and can manage data sets of various sizes.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 61
  • 10.2196/23139
Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation.
  • Nov 16, 2020
  • Journal of Medical Internet Research
  • Khaled El Emam + 2 more

BackgroundThere has been growing interest in data synthesis for enabling the sharing of data for secondary analysis; however, there is a need for a comprehensive privacy risk model for fully synthetic data: If the generative models have been overfit, then it is possible to identify individuals from synthetic data and learn something new about them.ObjectiveThe purpose of this study is to develop and apply a methodology for evaluating the identity disclosure risks of fully synthetic data.MethodsA full risk model is presented, which evaluates both identity disclosure and the ability of an adversary to learn something new if there is a match between a synthetic record and a real person. We term this “meaningful identity disclosure risk.” The model is applied on samples from the Washington State Hospital discharge database (2007) and the Canadian COVID-19 cases database. Both of these datasets were synthesized using a sequential decision tree process commonly used to synthesize health and social science data.ResultsThe meaningful identity disclosure risk for both of these synthesized samples was below the commonly used 0.09 risk threshold (0.0198 and 0.0086, respectively), and 4 times and 5 times lower than the risk values for the original datasets, respectively.ConclusionsWe have presented a comprehensive identity disclosure risk model for fully synthetic data. The results for this synthesis method on 2 datasets demonstrate that synthesis can reduce meaningful identity disclosure risks considerably. The risk model can be applied in the future to evaluate the privacy of fully synthetic data.

  • Research Article
  • 10.1186/s12963-026-00456-7
Multilevel regression and poststratification interface: an application to track community-level COVID-19 viral transmission.
  • Feb 3, 2026
  • Population health metrics
  • Yajuan Si + 4 more

Public health surveillance systems require high-quality data to represent the population. In the absence of comprehensive or random testing throughout the COVID-19 pandemic, we have developed a proxy method for synthetic random sampling to estimate the actual community-level viral incidence, based on viral testing of patients who are asymptomatic and present for elective procedures within a hospital system. The approach collects routine testing data on SARS-CoV-2 exposure among outpatients and performs statistical adjustments of sample representation using multilevel regression and poststratification (MRP), a procedure that adjusts for nonrepresentativeness of the sample and yields stable small group estimates. We extend MRP to accommodate time-varying data and granular geography. We have developed an open-source, user-friendly MRP interface for public implementation of the Bayesian analysis workflow. We illustrate the MRP interface with an application to track community-level COVID-19 viral transmission in Michigan. We present the estimated infection rate over time for the targeted population and across demographic and geographic subpopulations. The interface provides timely, substantive insights into population health trends and serves as a valuable surveillance tool for future epidemic preparedness. Beyond monitoring COVID-19, the MRP interface can analyze a wide range of health and social science data, making it broadly applicable to diverse research areas with reproducibility and scientific rigor.

  • Single Report
  • 10.61700/7xljx1ldaz67l2117
Bayesian Multilevel Modelling for Ecologists
  • Jan 1, 2025
  • Niamh Mimnagh

A hands-on workshop teaching Bayesian multilevel and multivariate modelling for ecological (and allied health and social science) data using brms with Stan, covering priors, MCMC diagnostics, GLMs, random effects, temporal and spatial dependence, and joint species distribution models. Participants leave with reproducible, publication-ready workflows for model building, criticism, visualization, and reporting—able to specify and defend priors, diagnose and fix convergence or fit issues, and translate posterior inference into management, policy, or research outputs.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 138
  • 10.3389/fpsyg.2018.00580
Structural Equation Modeling With Many Variables: A Systematic Review of Issues and Developments.
  • Apr 25, 2018
  • Frontiers in Psychology
  • Lifang Deng + 2 more

Survey data in social, behavioral, and health sciences often contain many variables (p). Structural equation modeling (SEM) is commonly used to analyze such data. With a sufficient number of participants (N), SEM enables researchers to easily set up and reliably test hypothetical relationships among theoretical constructs as well as those between the constructs and their observed indicators. However, SEM analyses with small N or large p have been shown to be problematic. This article reviews issues and solutions for SEM with small N, especially when p is large. The topics addressed include methods for parameter estimation, test statistics for overall model evaluation, and reliable standard errors for evaluating the significance of parameter estimates. Previous recommendations on required sample size N are also examined together with more recent developments. In particular, the requirement for N with conventional methods can be a lot more than expected, whereas new advances and developments can reduce the requirement for N substantially. The issues and developments for SEM with many variables described in this article not only let applied researchers be aware of the cutting edge methodology for SEM with big data as characterized by a large p but also highlight the challenges that methodologists need to face in further investigation.

  • Research Article
  • Cite Count Icon 1
  • 10.1029/2021eo158802
Integrating Data to Find Links Between Environment and Health
  • May 26, 2021
  • Eos
  • Zhong Liu + 3 more

Several obstacles stand in the way of integrating social, health, and Earth science data for vital geohealth studies, but there are tools and opportunities to overcome these obstacles.

  • Research Article
  • 10.1007/s00799-025-00416-w
CESSDA data catalogue: an opportunity to enhance data in social sciences
  • Mar 1, 2025
  • International Journal on Digital Libraries
  • Filippo Accordino + 2 more

This work aims to offer an overview of the data deposited in the European archives belonging to CESSDA (the Consortium of European Social Science Data Archives), by describing them and highlighting some critical issues in the metadata management that archives should address in the data ingestion procedure. The main purposes are: i) to identify the degree of quality (completeness and accuracy) of metadata and the use of controlled vocabularies; ii) to describe the features of deposited datasets; iii) to highlight the critical points in metadata compilation.To perform the analysis, the authors used metadata from all datasets collected by the national archives, retrieving them from the CESSDA Data Catalogue.The results show the degree of completeness and accuracy achieved by the archives and the use of controlled vocabularies. Metadata analysis illustrates which types of data are most frequent or simply available at the current state, highlighting the characteristics of content in terms of topics, as well as some recurring methodological features of data collection.The evaluation of the metadata quality provides indications for archives to improve the data ingestion process. The results highlight the responsibility of archives and research infrastructure in promoting the correct production of metadata and ensuring compliance with the FAIR Principles, especially in terms of findability and interoperability.

  • Book Chapter
  • 10.1007/978-3-032-07005-0_2
Secondary, Longitudinal, and Panel Data in Social Science Research
  • Aug 27, 2025
  • Ilaria Primerano + 4 more

Secondary data represent a vital components in the toolkit of social scientists. Among them, panel and longitudinal data are especially valuable for observing changes within a sample population over time. This chapter aims to highlight the power of these data sources by which social scientists can uncover valuable insights that inform policy decisions, shape interventions, and contribute to our understanding of the world. To be more effective and useful for the purposes of use by public institutions or academies and research institutions in the extemporaneous reading of social phenomena, the characteristic of the speed with which the phenomena studied are detected and analyzed constitutes a significant added value which makes Research Infrastructures on longitudinal and panel data in social science, unique in its kind, and is inserted into a lack of information that is crucial.

  • Research Article
  • 10.18844/gjit.v6i1.392
Big data - a review in health sciences
  • Mar 15, 2016
  • Global Journal of Information Technology
  • Ayça Kurnaz Türkben + 3 more

Technologies are changing very fast and data has an impact on the change of technology and development of world. Data are obtained by social media, the Internet and mobile technologies. For years, academics, researchers and companies utilize some sources and information to analyze them for their studies and jobs. Increasing usage of mobile devices, social networks, electronic records of customers in public and private sectors have led to increase in data. Obtained massive amount of data is called big data. There are a lot of description of big data in the literature, but simply it can be said that; big data is the data which have a massive size and can be obtained from every environment. One of these environment is health environment and it has grown fastly through that huge amount of data exist in this sector like patients’ electronic health record. Health sector has a high cost and decision will be taken as soon as possible and correctly in this sector in which timing is critically important. In this manner, the usage of big data in health is important to increase the quality of service, innovative health operations and decrease the cost. In this study, a brief review of literature has done for the use of big data in health sciences for last five years. Big data’s content, methods, advantages and difficulties are discussed in this review study. Keywords: Health science, Big data, Medicine, data mining

  • Research Article
  • 10.23889/ijpds.v9i5.2813
The CanPath-HDRN Canada Collaboration: Enabling Multi-jurisdictional Research in Canada
  • Sep 10, 2024
  • International Journal of Population Data Science
  • Jennifer Brooks + 31 more

ObjectiveTo highlight the partnership between the Canadian Partnership for Tomorrow’s Health (CanPath) and Health Data Research Network (HDRN Canada), which enables researchers to link CanPath’s health and lifestyle survey data to health records and related data across multiple regions. BackgroundCanPath is a population health study of over 350,000 Canadians from seven regional cohorts across all ten provinces, making it one of the world's largest population cohorts. HDRN Canada is a network of member organizations, including provincial, territorial, and pan-Canadian data centres. ApproachThis partnership leveraged collective expertise and resources to facilitate the linkage of CanPath’s harmonized data to multi-regional health and health-related administrative data held at HDRN Canada’ s data centres. Researchers can access these data for multi-regional projects through HDRN Canada’s Data Access Support Hub, which provides a single access portal. ResultsResearchers were able to successfully link CanPath’s survey data to provincial administrative health data. The collaboration’s streamlined process for data access enhances efficiency and facilitates pan-Canadian population health research. ConclusionThis collaboration demonstrates the feasibility and value of linking population health datasets while demonstrating the challenges and opportunities associated with accessing national administrative health data within a federated health data system. ImplicationsThe partnership between CanPath and HDRN Canada has significant implications for advancing population health research in Canada. By providing researchers with access to linked data from diverse sources, the partnership enables comprehensive investigations into health determinants, disease patterns, and clinical outcomes. This enhances the scope and depth of population health research in Canada, thereby leading to a better understanding of the most pressing health challenges.

  • Research Article
  • Cite Count Icon 165
  • 10.1016/j.bdr.2015.02.002
Promises and Challenges of Big Data Computing in Health Sciences
  • Feb 18, 2015
  • Big Data Research
  • Tao Huang + 5 more

Promises and Challenges of Big Data Computing in Health Sciences

  • Research Article
  • Cite Count Icon 14
  • 10.1111/1752-1688.12568
Data Management Dimensions of Social Water Science: The iUTAH Experience
  • Sep 1, 2017
  • JAWRA Journal of the American Water Resources Association
  • Courtney G Flint + 2 more

Integrating social and hydrologic sciences for understanding water systems is challenged by data management complexities. Contemporary mandates for open science and data sharing necessitate better understanding of the implications of social science data types. In the context of an interdisciplinary water research program that endeavors to integrate and share social science and biophysical data, we highlight the array of data types and issues associated with social water science. We present a multi‐dimensional classification of social water science data that provides insight into data management considerations for each data type. Recommendations for cyberinfrastructure, planning, and policy are offered.

  • Book Chapter
  • 10.1007/978-981-16-1781-2_23
The Role of Big Data in Social Science: A Case Study Using Hadoop
  • Sep 10, 2021
  • Nour Alqudah + 1 more

The research on big data in the social sciences and its impact has received much interest from practitioners and policy-makers. Data science can help find the answer to research questions in the social sciences because data is the lifeblood of the decision-making process; it is also the raw material for the accountability process. New data sources, new technologies, and new analytical approaches can make evidence-based decision-making more efficient and flexible. The analysis of this data plays a large role in discussing the challenges facing our societies today. This research provides an analysis of the six factors that influence happiness—GDP per capita, social support, life expectancy, freedom, corruption, and generosity. In this research, the World Happiness Report was studied in 2019, where its survey of the state of global happiness ranks 156 countries through their citizens’ happiness. Depending on six factors. This study focuses on analyzing factors that affect happiness and satisfaction with life.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.