Analysis of High Dimensionality Yeast Gene Expression Data Using Data Mining

Mazin Aouf,Liwan Liyanage

doi:10.4028/www.scientific.net/amm.197.515

Abstract

Data Mining is the process of discovering interesting knowledge from large amounts of data stored either in databases, data warehouses, or other information repositories. From biological studies, the Yeast Proteome Database (YPD) is a model for the organization and presentation of genome-wide functional data. Accordingly, a yeast gene expression which is a unicellular DNA is selected which contains 6103 genes and the database combined with a number of related dataset to create a general dataset. DNA-binding transcriptional regulators interpret the genome’s regulatory code by binding to specific sequences to induce or repress gene expression. The gene products including RNA and protein are responsible for the development and functioning of all living membranes by 2 steps process, transcription and translation. Various transcription factors control gene transcription by binding to the promoter regions. Translation is the production of proteins from mRNA produced in transcription. In this study, out of the 169 transcription factors known to access yeast, we are considering those thought to be involved in the response of Hydrogen Peroxide (H2O2). They are 22 transcription factors. Each one is partitioned to 3 parts: TF with No H2O2, TF with Low H2O2 and TF with High H2O2. The aim of this paper was to enhance the effectiveness of the integration of hydrogen peroxide response data related to yeast gene expression data to obtain a protein response process model and to label a set of important genes related to this approach.

Full Text