Abstract

Discovery of gene regulatory network from gene expression data can yield a useful insight to drug development. Among the methods applied to time-series data, Granger causality (GC) has emerged as a powerful tool with several merits. Since gene expression data usually have a much larger number of genes than time points therefore a full model cannot be applied in a straightforward manner, GC is often applied to genes pair wisely. In this study, the authors first investigate with synthetic data how spurious causalities (false discoveries) may arise because of the use of pairwise rather than full-model GC detection. Furthermore, spurious causalities may also arise if the order of the vector autoregressive model is not high enough. As a remedy, the authors demonstrate that model validation techniques can effectively reduce the number of false discoveries. Then, they apply pairwise GC with model validation to the real human HeLa cell-cycle dataset. They find that Akaike information criterion is generally most suitable for determining model order, but precaution should be taken for extremely short time series. With the authors proposed implementation, degree distributions and network hubs are obtained and compared with existing results, giving a new observation that the hubs tend to act as sources rather than receivers of interactions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call