Abstract

Finding small homogeneous subgroup cohorts in large heterogeneous populations is a critical process for hypothesis development in biomedical research. Concurrent computational approaches are still lacking in robust answers to the question “what hypotheses are likely to be novel and to produce clinically relevant results with well thought-out study designs?” We have developed a novel subgroup discovery method which employs a deep exploratory mining process to slice and dice thousands of potential subpopulations and prioritize potential cohorts based on their explainable contrast patterns and which may provide interventionable insights. We conducted computational experiments on both synthesized data and a clinical autism data set to assess performance quantitatively for coverage of pre-defined cohorts and qualitatively for novel knowledge discovery, respectively. We also conducted a scaling analysis using a distributed computing environment to suggest computational resource needs for when the subpopulation number increases. This work will provide a robust data-driven framework to automatically tailor potential interventions for precision health.

Highlights

  • M UCH of successful biomedical research relies on identifying key predictive factors within specific populationsManuscript received February 24, 2019; revised June 1, 2019 and July 23, 2019; accepted August 25, 2019

  • To bridge the knowledge gap, in this paper, we introduce a unique exploratory mining approach, shown in Fig. 1, that enables the broad biomedical research community to answer the following questions: Which subgroups of patients might benefit from interventions that are likely to be effective for the selected populations? Our contribution is the development of a suite of computational methods that are pipelined in a distributed computing environment to tackle the issues of identifying and prioritizing cohorts of patient subpopulations and revealing explainable contrast patterns for potential interventions

  • Taking advantage of the advancement of computing power, we have developed the Guided Cascading Shotgun (GCS) approach to explore hundreds to thousands of potential subgroup cohorts which are comparably valuable during the Floating Contrast Subgroup Selection process

Read more

Summary

INTRODUCTION

M UCH of successful biomedical research relies on identifying key predictive factors within specific populations. By finding meaningful and homogeneous subgroups prior to conducting clinical trials, researchers can further study focused populations and identify potential risk factors from complex data sources to create tailored treatments [4]. There are two major barriers to such tailored care: the effort required to identify meaningful subgroups of patients for clinical trials/outcome research, and the high cost of developing interventions for such small populations. The impact of this work is to allow researchers and clinicians to intelligently slice and dice through hundreds of thousands of potential subgroups and focus on only those subgroups which are evidence-based, data-driven, and statistically significant with actionable potential We believe this capability will enable the biomedical research community to acquire advanced medical knowledge and produce innovative treatments at a much faster pace than what is currently possible

RELATED WORKS
DATA MAPPING
DEEP EXPLORATORY MINING
Floating and Path Expansion
Contrast Pattern Mining
Subgroup Prioritization Using J-Value
DISTRIBUTED COMPUTING ALGORITHMS
EXPERIMENTS
Synthetic Data – Cohort Coverage and Computing Resources Assessments
Autism Data Set – Novel Discovery Assessment
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.