Recent abundance of data from studies employing high-throughput technologies to reveal alterations in human disease on genomic, transcriptomic, proteomic and other levels, offer the possibility to integrate this information into a comprehensive picture of molecular events occurring in human disease. Diversity of data originating from these studies presents a methodological obstacle in the integration process, also due to difficulties in choosing the optimal unified denominator that would allow inclusion of variables from various types of studies. We present a novel approach for integration of such multi-origin data based on positions of genetic alterations occurring in human diseases. Parkinson's disease (PD) was chosen as a model for evaluation of our methodology. Datasets from various types of studies in PD (linkage, genome-wide association, transcriptomic and proteomic studies) were obtained from online repositories or were extracted from available research papers. Subsequently, human genome assembly was subdivided into 10 kb regions, and significant signals from aforementioned studies were arranged into their corresponding regions according to their genomic position. For each region, rank product values were calculated and significance values were estimated by permuting the original dataset. Altogether, 179 regions (representing 33 contiguous genomic regions) had significant accumulation of signals when P-value cut-off was set at 0.0001. Identified regions with significant accumulation of signals contained 29 plausible candidate genes for PD. In conclusion, we present a novel approach for identification of candidate regions and genes for various human disorders, based on the positional integration of data across various types of omic studies.
Read full abstract