Articles published on Data hub
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
518 Search results
Sort by Recency
- New
- Research Article
- 10.30574/wjarr.2026.30.2.1167
- May 31, 2026
- World Journal of Advanced Research and Reviews
- Omotoso Samuel Sunday + 1 more
Fraud and improper payments across U.S. federal benefits programs impose substantial fiscal and social costs, with the Government Accountability Office estimating annual fraud-related losses of $233–521 billion and cumulative improper payments exceeding $2.7 trillion since fiscal year 2003. Traditional "pay-and-chase" recovery models have proven insufficient, prompting growing interest in preventive strategies that leverage cross-agency data fusion and predictive intelligence. This paper presents a qualitative comparative analysis of four dominant institutional and technical models used to integrate data and apply predictive analytics for fraud reduction: centralized federal data hubs, federated and privacy-preserving architectures, vendor-managed identity and fraud detection systems, and state-level or program-specific predictive analytics. Each model is evaluated against five criteria derived from federal oversight guidance and academic literature: legal and institutional authority, data governance and privacy protection, technical architecture and scalability, analytic effectiveness, and transparency and accountability. The analysis demonstrates that no single model adequately balances these competing demands. Centralized hubs offer statutory grounding but limited analytic flexibility; federated approaches strengthen privacy at the cost of governance complexity; vendor systems raise accountability concerns; and program-specific models lack cross-jurisdictional reach. The paper proposes a hybrid framework combining centralized authoritative checks, federated analytics for sensitive data, and program-level predictive modeling under unified governance with human-in-the-loop decision-making. Policy recommendations address statutory authorization, oversight structures, model validation, vendor accountability, and workforce capacity to support responsible, equitable, and scalable fraud prevention.
- New
- Research Article
- 10.1136/ip-2025-045681
- May 15, 2026
- Injury prevention : journal of the International Society for Child and Adolescent Injury Prevention
- Lola Ortiz-Whittingham + 10 more
IntroductionFirearm injuries and fatalities adversely affect individual and community health and are important public health issues. Identifying factors that protect against firearm violence and related harms is needed. We aimed to identify neighborhoods with lower- and higher-than-expected firearm injuries and fatalities based on social vulnerability and their differing structural and social characteristics.MethodsWe used Gun Violence Archive data from the RISE Lab Firearm Injury Data Hub to estimate firearm injury and fatality counts between 2015–2020 in Pittsburgh, Pennsylvania. We used a negative binomial regression model to estimate the relationship between the CDC’s Social Vulnerability Index and firearm injuries and fatalities and the residual percentile method to identify neighborhoods with lower (resilient) and higher (at-risk) counts than expected based on social vulnerability. T-tests were used to compare resilient and at-risk neighborhoods for 158 US Census American Community Survey demographic, socioeconomic, housing, and transportation variables.ResultsResilient neighborhoods (N=19, bottom quartile residuals) had lower rates of firearm injuries and fatalities (p=0.0002) compared to at-risk neighborhoods (N=19, top quartile residuals). Resilient and at-risk neighborhoods differed for 2 (percent male never married and female widowed) of the variables included in analyses.ConclusionsSome neighborhoods, despite facing risk factors, experienced fewer firearm injuries and fatalities than would be expected based on neighborhood social vulnerability. Existing sources of secondary data on neighborhoods may fail to adequately capture potential factors that prevent against firearm violence and to measure resilience in thriving neighborhoods. Future community-engaged studies are needed to understand and measure neighborhood-level protective factors.
- Research Article
- 10.1182/bloodadvances.2025019450
- May 12, 2026
- Blood advances
- Najibah Galadanci + 5 more
Novel use of a hash-based tokenization for sharing sickle cell disease data without sharing protected health information.
- Research Article
- 10.55041/isjem07139
- May 5, 2026
- International Scientific Journal of Engineering and Management
- Dr Umesh Pawar + 5 more
Abstract—Managing professional information across multiple online and offline platforms has become increasingly difficult for students, job seekers, and working professionals[cite: 17]. Traditional resume creation, document verification, and portfolio management often require repetitive efforts and lead to incon-sistencies[cite: 18]. Profile Hub aims to provide a centralized, organized, and secure digital platform to store, manage, and share all professional data in one place[cite: 19]. The system consolidates documents, certificates, achievements, skills, per-sonal information, and work experience into a single dynamic profile[cite: 20]. It enables automated resume generation, cloud-based document storage, quick sharing via a unique digital link, and verification-ready digital identity[cite: 21]. Profile Hub eliminates redundancy, improves accessibility, and creates a streamlined solution for modern digital career management[cite: 22]. Index Terms—Professional Data Hub, Digital Portfolio, Re-sume Automation, Cloud Storage, Profile Management System [cite: 23]
- Research Article
- 10.1182/bloodadvances.2025017562
- Apr 28, 2026
- Blood advances
- Alexis A Thompson + 10 more
High concordance of physician-attestation with manual data abstraction for sickle cell type: an ASH RC Data Hub study.
- Research Article
- 10.1186/s12919-026-00362-8
- Apr 23, 2026
- BMC proceedings
- Gideon Kwarteng Acheampong + 18 more
In a context of increasing efforts towards the establishment of a Regional Health Data Hub for the African Region, the 2024 West African Policy Dialogue brought together researchers and policymakers from seven West African countries in a two-day meeting in Aburi, Ghana. This report provides a high-level summary of the discussions at the meeting. The forum emphasized that the use of poor, incomplete, or inaccurate data will have negative consequences, regardless of the sophistication of the analytic tools used. New technologies have emerged that can support the generation and effective use of data. Yet, governments in West-Africa struggle to maximize the benefits of these technologies, including genomic surveillance, real-time data generation, and supranational data integration and exchange. Policies are needed that support and regulate new technologies and contribute to greater capabilities for better data.
- Research Article
- 10.12688/f1000research.179326.1
- Apr 13, 2026
- F1000Research
- Daniel Strech
Background Operators of health data hubs and registries enable secondary use by implementing consent processes, privacy safeguards, access governance, data quality management, transparency, and stakeholder engagement. Ethical guidance is extensive, yet it rarely specifies operator-level practices or indicators that show whether governance performs well in routine use. This paper proposes an operator-focused framework that makes ethics implementable and assessable. Methods The framework synthesises international guidance, systematic reviews and other empirical work on governance gaps, and publicly available governance materials from major data infrastructures. Ethics practices were selected for conceptual distinctness and operator-level controllability, grouped into seven governance domains, and aligned with a PDCA (Plan–Do–Check–Act) cycle. For each practice, the framework provides illustrative audit items, descriptive metrics, and outcome indicators intended as adaptable references rather than prescriptive standards. Results Seven domains are covered: Informed Consent & Information, Privacy & Data Protection, Data Quality, Use & Access Governance, Incidental Findings, Stakeholder Engagement, and Project-Level Transparency. Each domain contains four PDCA-aligned practices with linked evaluation items. Discussion The framework shifts practical bioethics for secondary use from policy existence to demonstrable performance. It can support operator self-audit and proportionate oversight by funders, ethics committees, and stakeholder bodies, while remaining modular across legal and infrastructural contexts. It also provides a starting point for developing minimal, transparent reporting expectations to support trust-building governance.
- Research Article
6
- 10.1007/s00330-025-12031-z
- Apr 1, 2026
- European radiology
- Ahmed Marey + 6 more
Artificial intelligence (AI) promises to accelerate and democratize medical imaging, yet low- and middle-income countries (LMICs) face distinct barriers to adoption. This perspective identifies those barriers and proposes an action-oriented roadmap. Insights were synthesized from a Johns Hopkins Science Diplomacy Hub workshop (18 experts in radiology, AI, and health policy) and a scoping review of peer-reviewed and grey literature. Workshop discussions were transcribed, thematically coded, and iteratively validated to reach consensus. Five interlocking barriers were prioritized: (1) infrastructure gaps-scarce imaging devices, unstable power, and limited bandwidth; (2) data deficiencies-small, non-representative, or ethically constrained datasets; (3) workforce shortages and brain drain; (4) uncertain ethical, regulatory, and medicolegal frameworks; and (5) financing and sustainability constraints. Case studies from Nigeria, Uganda, and Colombia showed that low-field MRI, cloud-based PACS, community-engaged data collection, and public-private partnerships can successfully mitigate several of these challenges. Targeted policy levers-including shared procurement of low-cost hardware, regional AI and data hubs, train-the-trainer workforce programs, and harmonized regulation-can enable LMIC health systems to deploy AI imaging responsibly, shorten diagnostic delays, and improve patient outcomes. Lessons are transferable to resource-constrained settings worldwide. Question How can LMICs overcome infrastructure, data, workforce, regulatory, and financing barriers to implement artificial-intelligence tools in clinical medical imaging? Findings Our multinational consensus identifies five obstacles and maps each to actionable levers: low-cost hardware, regional data hubs, train-the-trainer schemes, harmonized regulation, blended financing. Clinical relevance Implementing these targeted measures enables LMIC health systems to deploy AI imaging reliably, shorten diagnostic delays, and improve patient outcomes while reducing dependence on external expertise.
- Research Article
- 10.1016/j.ijdrr.2026.106096
- Apr 1, 2026
- International Journal of Disaster Risk Reduction
- Luis Caetano + 4 more
Climate adaptation planning and risk transfer mechanisms rely on accurate, interoperable loss data. However, integrating heterogeneous datasets from insurers, public authorities and academic organizations remains a major challenge due to inconsistent formats, definitions, and privacy constraints. This study proposes a semantic approach to harmonize and operationalize climate-related economic loss data. We introduce the SOTERIA Ontology, designed to standardize key entities, such as assets, hazards, damages, and claims, thereby enabling semantic interoperability across diverse sources. Using Norwegian address-level insurance data, we populated a knowledge graph and developed a web-based prototype for data visualization and exploration. The prototype supports granular queries, interactive visualization, and automated Risk Data Hub (RDH) compliant aggregation. The results obtained through the prototype highlight clear spatial, and exposure-related patterns in loss distribution. Flood damages were concentrated in low-lying riverine municipalities, while storm damages were more prevalent in exposed coastal regions. The likelihood of flood damage decreased with increasing distance from mapped flood hazard areas, storm damage rates increased with wind exposure, and average flood damage costs rose with building footprint size. These results demonstrate the potential of ontology-driven systems to enhance data quality, enable advanced analytics, and support evidence-based climate adaptation strategies. This work delivers the first ontology-based framework and prototype specifically designed to integrate insurance-based economic loss data into European risk governance workflows aligned with the RDH. • We present an ontology-driven approach to integrate heterogeneous climate-related economic loss data for improved interoperability. • The SOTERIA Ontology and knowledge graph enable advanced querying, visualization, and Risk Data Hub–aligned aggregation. • A Norwegian case study demonstrates the feasibility of semantic integration using detailed insurance claims and exposure datasets. • Our framework enhances data quality and supports evidence-based climate adaptation strategies and risk-informed decision-making.
- Research Article
- 10.1016/j.rehab.2025.102075
- Apr 1, 2026
- Annals of physical and rehabilitation medicine
- Jonathan Levy + 8 more
People with multiple sclerosis (PwMS) are ageing and exposed to multiple impairments. We described the use of health care procedures in a physical and rehabilitation medicine setting for PwMS ≥70 years and identified specific participant clusters. In this observational cohort study using a local data hub (2012 to March 2024), PwMS were identified with ICD-10 codes (G35). An age filter (≥70) was applied. Medical procedures were systematically coded according to a specific French classification over time, including dates of occurrence, and then extracted and grouped by impairment domains: upper motor neuron syndrome and orthopedic deformities (UMN-OD), respiratory and sleep disorders (RSD), and neurogenic lower urinary tract dysfunction (NLUTD). Clinical data were retrospectively collected from medical files. We conducted descriptive analyses and multiple-component and clustering analyses to identify and characterize the participants' profiles. Among 206 participants aged 75.7 (4.3) years, 62% (128) were women, 19% (39) had died, and MS had evolved for 43.3 (12.4) years. The Expanded Disability Status Scale (EDSS) was 7.5 (6.5-8.5), and the Charlson Comorbidity Index (CCI) was 6 (5-7). A total of 2794 procedures were performed for 187 participants, mainly in NLUTD (1424 for 170), which was associated with RSD (P = 0.001), but not UMN-OD (P = 0.262). Three groups were identified (Group 1 = isolated NLUTD, Group 2 = RSD and intrathecal baclofen procedures, and Group 3 = no in-hospital management). People in Group 2 had more severe comorbidities (P < 0.001) than those in the other two groups (P = 0.016). In PwMS ≥70 years, the hospital management of MS-related impairments was directly associated with disease severity and overall comorbidities. Participants who were followed only for NLUTD exhibited severity and comorbidity profiles similar to those of participants who did not require hospital procedures. Our institutional health data warehouse and all derived extracted databases within the scope of the health team perimeter are approved by the French Data Protection Authority under the No. 1980120.
- Research Article
- 10.1002/acr.80026
- Feb 24, 2026
- Arthritis care & research
- Lucinda Roper + 7 more
Systemic lupus erythematosus (SLE) is a heterogenous inflammatory condition, with widely varying global prevalence estimates. Frequency of SLE in the general population of Australia has been reported to be notably lower than contemporary estimates in countries such as the US or UK at 19-39 per 100,000 as opposed to 65-97 per 100,000. This study aimed to develop a national SLE cohort using linked administrative datasets and to estimate prevalence using this approach. We developed an algorithm to identify SLE cases using the National Health Data Hub, a linked national administrative data asset, including data on medications dispensed and medical services subsidised under Australia's universal subsidised coverage schemes, hospital admissions and deaths. These classification criteria were developed with an expert panel of rheumatologists. Individuals were classified as 'certain', 'uncertain', or 'no SLE'. Cohort characteristics were described, and period prevalence (2010-2022) was calculated. A cohort of 26,788 individuals with SLE were identified, consisting of 16,294 certain cases and an additional 10,494 uncertain cases. The period prevalence of SLE in Australia from 2010-2022 was 77-127/100,000 (certain cases or overall cases). Among the certain cases, just over half had received care for SLE only in an outpatient setting. This is the first Australian SLE study using comprehensive linked administrative health data and identified higher prevalence than previously reported, more closely aligning with international estimates. This algorithm may serve as a foundation for future Australian and international studies seeking to identify SLE in administrative health datasets.
- Research Article
- 10.1080/02665433.2026.2613119
- Feb 5, 2026
- Planning Perspectives
- Sarah Williams + 1 more
ABSTRACT International infrastructure projects are deeply ensconced in debates about power, intentions, and imaginaries of progress that also stir debate about what it means to decolonize international relations and projects that nominally build improvements in low-income environments. Traditional prioritization of technical expertise, the criticality of contextual knowledge, and the dynamics of uneven partnerships involved are all central elements of these pronounced tensions in the practice of international infrastructure development. This paper instead describes an international infrastructure project that is centred on community stewardship and co-design. The infrastructure project Living Data Hubs (LDH), highlighted here, is a small-scale information and communication technology (ICT) and data management project in the informal settlement of Kibera in Nairobi, Kenya. In its development, equal attention was afforded to technical, conceptual, and procedural knowledge production, ensuring that both universal and contextual elements of the ICT infrastructure and data management were explicitly discussed and valued. We argue that international partnerships like LDH that give space to technical, conceptual, and procedural capacity building alike can produce community stewarded infrastructure that is sustainable and puts forward a pathway for diminishing uneven power relations and enhancing community well-being.
- Research Article
- 10.1162/99608f92.8c9d8a4d
- Jan 30, 2026
- Harvard Data Science Review
- Vanessa Bouché + 3 more
Human trafficking (HT) remains one of the most complex and under-documented forms of transnational crime, in part due to persistent gaps in data quality, governance, standardization, and scope. This paper presents a comprehensive framework for building a Human Trafficking Data Ecosystem that addresses these limitations through three interconnected components: a data trust, a data hub, and a data taxonomy. Drawing from models in public health, social science, and data governance, the authors introduce the concept of a HT Data Trust to establish a shared governance and legal framework for ethical data stewardship; a HT Data Hub–exemplified by the Lighthouse platform powered by Allies Against Slavery–to centralize and standardize disparate data sources; and a HT Data Taxonomy to organize, evaluate, and strategically expand the field’s data architecture. This taxonomy is rooted in intersecting theoretical models—market conditions, intervention strategies, and the social-ecological framework—and is operationalized using a quantitative scoring system to assess dataset quality and coverage. The resulting heat map analysis identifies critical strengths and gaps across the anti-trafficking data landscape, offering both diagnostic insight and a strategic blueprint for improving data collection, integration, and use. By providing a replicable model for building a rigorous, ethical, and survivor-centered data infrastructure, this paper advances the field’s capacity to support evidence-based policy, targeted interventions, and systemic change.
- Research Article
- 10.1093/genetics/iyaf266
- Jan 19, 2026
- Genetics
- Nicholas Gladman + 23 more
Centralizing valuable community data and resources into a user-friendly interface and accessible repository has become essential for agricultural science; embracing Findable Accessible, Interoperable, and Reusable (FAIR) principles is now standard for effective databases. SorghumBase (https://www.sorghumbase.org) is a knowledgebase designed for the sorghum research community. The SorghumBase team curates genomic, transcriptomic, variation, and phenotypic information and aggregates community events, providing rich visualizations and bulk data access. The modular framework of the database is built with open-access software to yield a robust, modifiable, and sustainable data infrastructure. Release 9 of SorghumBase includes: (i) 88 sorghum reference genomes and an updated pan-gene index, (ii) over 100 million variants have been mapped onto the 2 genomes, BTx623 and Tx2783, (iii) assignment of 41 million Reference Cluster SNP identifiers (rsIDs) from BTx623 across the pan-genome, (iv) updated gene search homology, gene expression, and germplasm visualizations and features, (v) added and standardized 234 phenotypic data from 40 community-generated GWAS studies and 148 traits from the Sorghum QTL Atlas (Oz Sorghum), (vi) improved news, funding, and a research content management system for community access and interaction, (vii) outreach materials including training documents and videos, and (viii) community engagement initiatives through training and working groups. SorghumBase serves as a hub for sorghum data and stakeholder engagement while promoting community standards to drive research and multi-omics breeding approaches.
- Research Article
1
- 10.1136/jech-2025-225202
- Jan 13, 2026
- Journal of epidemiology and community health
- Jemar R Bather + 8 more
We investigated whether ghost gun recovery rates are significantly associated with firearm mortality rates in the following year across California's 58 counties from 2014 to 2023. We obtained yearly county-level data on ghost guns recovered in California from The Trace's Gun Violence Data Hub. County-level firearm death counts (total, suicide and homicide) were pulled from the Centers for Disease Control and Prevention's Restricted-Use Vital Statistics Data. Spatiotemporal models quantified the covariate-adjusted associations between ghost gun recoveries per capita and firearm death rates (total, suicide and homicide) in the following year. Secondary analyses examined suicide and homicide models stratified by sex and race/ethnicity. RESULTS : For every 20 ghost guns recovered per 100 000 population, there was an associated 6.4% increase in firearm suicide rate (adjusted incidence rate ratio (aIRR): 1.064, 95% credible interval (CrI) 1.019 to 1.111) in the following year. We found no evidence of a significant ghost gun recovery association with total firearm death rate (aIRR: 1.036, 95% CrI 0.999 to 1.075) and firearm homicide rates (aIRR: 1.002, 95% CrI 0.946 to 1.064). Stratified models for firearm suicide rates suggested variations across sex and racial/ethnic groups, with significant positive associations observed for male (6.5% increase; aIRR: 1.065, 95% CrI 1.017 to 1.115), non-Hispanic white (6.2% increase; aIRR: 1.062, 95% CrI 1.005 to 1.122) and Hispanic (12.6% increase; aIRR: 1.126, 95% CrI 1.031 to 1.230) individuals. A different pattern emerged for firearm homicide death rates, where associations across demographic groups were not statistically significant. Practitioners concentrating on suicide prevention efforts should be advised about the threat that ghost guns may present.
- Research Article
- 10.25159/2663-659x/20930
- Jan 9, 2026
- Mousaion: South African Journal of Information Studies
- Mpilo Siphamandla Mthembu + 1 more
Research data management (RDM) has emerged as a significant and evolving field in Information Science and Knowledge Management, supported by several theoretical models guiding its practice in data-intensive institutions such as universities, libraries, data hubs and research centres. This article, drawn from a larger study on RDM in selected South African public universities, proposes a framework to address the absence of a nationally accepted model. The post-positivist research paradigm was used in the study, employing both qualitative and quantitative methodologies. Semi-structured interviews with 23 study participants were conducted online via Microsoft Teams to collect qualitative data. The questionnaires were converted into Google Forms and emailed to 30 researchers rated by the National Research Foundation to collect the quantitative data. Although existing frameworks such as the community capability model framework, data audit framework, collaborative assessment of research data infrastructure and objectives model, and the Digital Curation Centre lifecycle model are largely applied in developed contexts, this study highlights the need for a locally relevant framework. The proposed model emphasises collaboration among universities and key stakeholders, policy development, and the possible setting up of a centralised national research database. The framework is expected to inform RDM practice in South Africa and provide insights for comparative studies internationally.
- Research Article
- 10.7759/cureus.100983
- Jan 7, 2026
- Cureus
- Fatima A Almeleh + 5 more
Background Premarital screening and genetic counseling are key preventive measures for reducing hereditary conditions in regions with high consanguinity, such as the United Arab Emirates. Understanding the knowledge, attitudes, and practices of couples attending premarital clinics is essential for improving service uptake and guiding targeted educational strategies. Methodology A multi-center, cross-sectional study was conducted from April to June 2025 across 23 premarital clinics within Emirates Health Services (EHS). Individuals aged 18 years or older who spoke Arabic or English and provided consent were included. Data were collected using a validated questionnaire administered through the EHS Data Hub. Descriptive and inferential analyses were performed to assess factors associated with knowledge, attitudes, and practices. Results A total of 255 participants were included; 66.7% were male, and 54.1% were aged 30-45 years. More than half were university graduates and employed. Good knowledge was reported by 77.3% of participants, positive attitudes by 92.2%, and good practice by 58.0%. Lower knowledge levels were more common among younger participants, males, and those with lower educational attainment. Females had higher odds of good knowledge (odds ratio (OR) = 2.57, 95% confidence interval (CI) = 1.26-5.27). Participants older than 45 years had higher odds of reporting good practice (OR = 4.15, 95% CI = 1.20-14.34). Good knowledge (OR = 2.27) and positive attitudes (OR = 24.09) were significant predictors of good practice. Conclusions Couples demonstrated strong knowledge and attitudes toward premarital screening and genetic counseling, although practice levels were comparatively lower. Targeted educational initiatives, especially for younger individuals, are needed to enhance engagement and bridge the knowledge-practice gap.
- Research Article
- 10.2139/ssrn.6501698
- Jan 1, 2026
- SSRN Electronic Journal
- Md Hashibul Hassan + 3 more
Data-Hub for Effective Local Government Service Delivery in Bangladesh: A Case Study of Union Parishads
- Research Article
- 10.12688/openreseurope.23235.1
- Jan 1, 2026
- Open research Europe
- Iliya Dauda Kwoji + 13 more
BlastoDB ( https://www.blastodb.com/) is developed as an open-access, community-driven resource dedicated to Blastocystis, one of the most common yet understudied intestinal protists. BlastoDB will offer the scientific community up-to-date, curated information on Blastocystis by integrating epidemiological data, microbiome profiles, multi-omics datasets (genomics, transcriptomics, proteomics, and metabolomics), reference sequences for subtypes, protocols, microscopy images, and related metadata. In this initial release, we describe the data model, database architecture, curation pipelines, and web interface, which together facilitate subtype classification, comparative and integrative analyses, and cross-study synthesis of epidemiological and experimental data. We outline submission and governance workflows designed to support community contributions, training activities, and sustainable curation under the " Blastocystis under One Health" COST Action (CA21105). Finally, we highlight planned extensions, including expanded metagenomic and metatranscriptomic content, automated genome quality assessments, metagenome-assembled genomes, and geospatial and analytical dashboards. BlastoDB provides a central, FAIR-aligned hub for Blastocystis data, images, and protocols, reducing technical barriers and fostering a collaborative ecosystem for studying this globally prevalent protist.
- Research Article
- 10.3389/fpsyt.2026.1760116
- Jan 1, 2026
- Frontiers in psychiatry
- Linda A Jones + 17 more
Routinely collected health data, such as that held by United Kingdom (UK) national health services (NHS), has important research uses. However, its use requires public trust and transparency. Access by commercial/industrial organisations is especially sensitive for the public, as is mental health (MH) data. Although existing MH data science guidelines emphasise patient/public involvement (PPI), they do not cover commercial uses specifically. To develop patient- and public-led guidelines for the commercial and industrial use of MH data for research. Though UK-focused, their principles may apply internationally. A PPI Lived Experience Advisory Group (LEAG) was created within DATAMIND, a UK data hub for MH informatics. Initial discussion yielded a requirement for definitions and explanations of concepts relating to MH data research, developed iteratively. Subsequently, the LEAG developed guidelines via a qualitative quasi-Delphi approach. The agreed scope excluded data provided for research with informed consent, data processing arrangements (e.g. companies hosting electronic systems on the instruction of health services), and compliance with legal minimum requirements. The scope included the use of routinely collected MH data for research by commercial/industrial organisations without explicit consent, and aspects of industry-led MH data collection conducted with consent. Alongside the primer in MH data research concepts, the LEAG provide best-practice guidelines relating to commercial/industrial research use of MH data, for organisations controlling MH data (such as NHS bodies) and for commercial applicants seeking access. Core principles include transparency, patient rights, meaningful PPI, stringent governance, and statistical disclosure control. The guidelines recommend a risk-benefit approach to assessing data access applications, within limits that include avoiding the export of unconsented patient-level data outside NHS-controlled secure data environments, and not providing commercial applicants with access to unconsented free-text MH data. Further recommendations for NHS executive and regulatory bodies relate to public choice and transparency, clarity of guidance to research-active NHS organisations, and support for de-identification. MH data research requires patient/public involvement and understanding. These guidelines reflect the views of people with personal or family experience of mental ill health. We hope they are useful to the MH research community and increase public transparency and trust.