The tranSMART Foundation's inaugural Datathon took place June 30–July 2 at the Thomson Reuters offices in Boston, MA. The overall aim of the Datathon was to determine the feasibility of using the tranSMART platform to explore multiple large datasets on a customized cloud server to support a Datathon that could generate new research findings. The goal of this Datathon was to identify similarities and differences across different neurodegenerative diseases, specifically Alzheimer's disease and Parkinson's disease, and to discover new insights into these diseases. Specific objectives were to identify: • Common biomarker changes across Parkinson and Alzheimer disease • Common pathway changes across Parkinson and Alzheimer disease • The normal distribution of imaging and fluid biomarkers across controls • Novel hypotheses, research findings or conclusions about these neurodegenerative diseases. 2. Design The tranSMART Foundation, the Michael J. Fox Foundation, the University of Luxembourg and the University of Michigan worked together with the Laboratory of Neuro Imaging (LONI) to install tranSMART v1.2.4 on cloud servers at LONI, and to install the 14 datasets to be used for the Datathon. The ADNI, PPMI, LRRK2 and BioFIND datasets were curated and loaded by Thomson Reuters, working with the Michael J. Fox Foundation. Due to restrictions on access to and redistribution of the ADNI, PPMI, LRRK2 and BioFIND datasets, data use agreements were executed with the Alzheimer's Data Neuroimaging Initiative, the Parkinson's Progression Markers Initiative, the LRRK2 dataset, led by the Michael J. Fox foundation, and the BIOFIND dataset. Ten datasets that were curated and loaded by the University of Luxembourg originated in GEO, and did not require any data use agreements. This Datathon marked the first time that these datasets have been made available in a single analytic platform. Together the datasets represent over $500 million investment in data generation. ADNI: The Alzheimer's Disease Neuroimaging Initiative (ADNI) is a longitudinal, multicenter study to develop genetic, biochemical, clinical, and imaging biomarkers for the early detection and progression tracking of Alzheimer's disease. 3142 patients are currently enrolled. PPMI: The Parkinson's Progression Markers Initiative is a longitudinal, multimodal observational study of a large patient population. The dataset contains biological sampling, advanced imaging, clinical and behavioral assessments; i.e. the Movement Disorder Society—Unified Parkinson's Disease Rating Scales (MD-UPDRS), Montreal Cognitive Assessment (MoCA) and the University of Pennsylvania Smell Identification (UPSIT). 1334 patients are currently enrolled. LRRK2: Michael J. Fox Foundation has established a LRRK2 Cohort Consortium to undertake an innovative approach to design and streamline drug development around the LRRK2 gene, a promising target. 2824 patients are currently enrolled. BIOFIND: is a clinical observational study designed to discover and validate novel biomarkers for Parkinson's disease. 229 patients are currently enrolled. 10 Parkinson's Disease (PD) studies from GEO: The NCBI Gene Expression Omnibus (GEO—http://www.ncbi.nlm.nig.gov/). In attempt to exact valuable knowledge, data scientists from the Luxembourg Centre for Systems Biomedicine (http://wwwen.uni.lu/lcsb), University of Luxembourg (http://wwwen.uni.lu) manually curated 10 PD studies from GEO, which are selected based on having good amount of clinical data apart from gene expression data. These studies were curated in the context of ongoing Innovative Medicine Initiate (IMI) project and eTRIKS (http://www.etriks.org). Data from these studies were passed through following workflow: data acquisition, parsing, manual inspections, data standardization, semantic alignment and mapping, the generated structured files are ready to be used as input for the tranSMART ETL (Extraction, Transformation and Loading) operations. The structured files were loaded into tranSMART using the Pentaho Kettle ETL scripts. The tranSMART Foundation, working with the University of Michigan and LONI, installed the platform on LONI cloud servers, and coordinated the installation of the curated datasets onto these servers with Thomson Reuters, the Michael J. Fox Foundation and the University of Luxembourg. The latest tranSMART platform, v1.2.4, was employed for these efforts. Access to the databases permitted participants to evaluate whether modifications were needed to make the data more usable. Twenty-five scientists from leading institutions in the US and Europe were selected from a pool of over seventy applicants five teams.1 In addition to the tranSMART platform, various third-party analytic tools were employed, including MetaCore™, R interface, Spotfire, E-Workbook, MatLab, and REFS™.
Read full abstract