Abstract

This article proposes a new non-parametric approach for identification of risk factors and their correlations in epidemiologic study, in which investigation data may have high variations because of individual differences or correlated risk factors. First, based on classification information of high or low disease incidence, we estimate Receptor Operating Characteristic (ROC) curve of each risk factor. Then, through the difference between ROC curve of each factor and diagonal, we evaluate and screen for the important risk factors. In addition, based on the difference of ROC curves corresponding to any pair of factors, we define a new type of correlation matrix to measure their correlations with disease, and then use this matrix as adjacency matrix to construct a network as a visualization tool for exploring the structure among factors, which can be used to direct further studies. Finally, these methods are applied to analysis on water pollutants and gastrointestinal tumor, and analysis on gene expression data in tumor and normal colon tissue samples.

Highlights

  • Identification of possible risk factors of specific diseases in epidemiologic studies is helpful in guiding diagnosis, therapy or disease control

  • Some reports have suggested that high levels of polycyclic aromatic hydrocarbons (PAHs) in the air may be associated with cancer[25,26,27]

  • Because of high variability, complex structure among correlative factors, and individual differences of data, it is unreasonable to construct specific mathematical models directly to study the influence of risk factors on disease, while the proposed methods, as non-parametric statistical methods without severe mathematical conditions, such as normality or linear style as in classical statistical methods, are appropriate to explore the relationship between various risk factors and disease incidence

Read more

Summary

Screening of risk factors based on ROC curve

We can simulate two independent Brownian bridges B1(t) and B2(t) using relationship between Brownian bridge and Brownian motion, and construct stochastic process of δ0(t) by equation (5) and SA by equation (6) Repeat this process for n times, and we can obtain n simulated observations of SA, from which we can obtain the empirical distribution of test statistic SA together with the hypothesis threshold as the null hypothesis H0 is true, and we can complete the hypothesis test to judge whether the ROC curve has significant deviation from the diagonal, which can be used to screen for the variable F ∈ F with important impact on disease

Construction of network based on correlation matrix
Ba Cd
Mn Cr
Discussion
Additional Information
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call