Abstract

In recent years, the field of biomedical, genetics and genomics research is undergoing a major change. Especially genomcis research is turning into a data-driven science, due to the rapid evolution of automatized data sensing technologies as well as data analysis methods. With the advent of new wet-lab technologies as the synthetic genetic array technique, the conduction of large-scale knockout experiments of thousands of genes has made a vast amount of data become available. With the knockout data at hand, a major challenge is the analysis of the available data which is commonly referred to as data-mining or knowledge-engineering. One important aspect of genomics research is the investigation of the influence of genetic interactions on the cell metabolism of organisms. For this purpose, the identification of genetic interactions with respect to a specific cell function is of high significance. Interactions among a collection of genes are well described by genetic interaction networks by means of directed graphs. In light of the importance of understanding the influence of genetic interactions for the cell metabolism, the problem of learning genetic interaction networks, which reflect the mutual genetic dependencies among a set of genes, has recently attracted much attention. In this dissertation, the focus lies on developing graph learning algorithms dedicated to the special structure of genetic interaction networks. The main contribution of this work is the formulation of the graph learning problem as integer linear programs that estimate the genetic interaction network underlying the knockout data. In particular, the two proposed integer linear program based formulations are of different accents, since the network topology is estimated in different representation domains. The two methods have the advantage over conventional graph learning methods like Graphical Lasso, that the estimates of both proposed integer linear program based algorithms are guaranteed by design to have the desired network structure, which is imposed by the specific biological interaction model that is under study in this thesis. Due to their intrinsic combinatorial nature, the proposed integer linear program based algorithms for graph learning cannot be applied to estimate large-scale genetic interaction networks. In order to compensate for this inability, a broader graph learning framework is presented which uses the proposed integer linear program based algorithms in an iterative fashion. Furthermore, ``of-the-shelf'' machine learning algorithms are customized to the graph learning problem. The proposed integer linear program based methods are evaluated in terms of synthetic data as well as real data and compared to state-of-the-art methods. It is observed that the proposed integer linear program based algorithms are superior to the considered state-of-the-art methods for both the synthetic data as well as the real data. Finally, the proposed integer linear program based algorithms are compared to selected ``of-the-shelf'' machine learning methods in terms of graph learning performance for synthetic data. Although the considered ``of-the-shelf'' machine learning methods cannot guarantee that their estimates are of the desired genetic interaction network structure, it is worth to mention that they yield a good tradeoff between estimation quality and the ability to estimate large-scale genetic interaction networks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call