Abstract
The upstream of genes are expected to contain many still unknown regulatory regions that can increase or decrease the expression of specific genes. The processes of mining distinctive patterns (region) are to extract maximal repeats (patterns) from the upstream DNA sequences of human genes, and then filter out the patterns whose class frequency distribution can fit in with that is specified by domain experts; the class frequency distribution of one pattern is the frequencies of that pattern appearing in each of classes. The computation of extracting maximal repeats and meanwhile computing their class frequency distribution can be done by a scalable approach based on a previous work via MapReduce programming model. Experimental resources include the DNA sequences extracted from the upstream 5, 000 bp DNA sequences of 49, 267 human coding&non-coding genes. The classes of human genes are divided into four classes as “non-cancer related protein-coding gene”, “oncogene”, “tumor suppressor gene” and “non-coding genes”(RNA). Experimental results show that 17 distinctive patterns selected as core patters whose length is longer than 36 bp and, appear in more than 3, 000 genes and in all of four classes. To have more specific observation, there are 22 distinctive patterns selected that appear in at least 10 genes and whose lengths are greater than 15 bp and, most of all, just happen in two classes, “oncogene” and “tumor suppressor gene”. It is very attractive and expected to extend this approach to mine for another types of distinctive patterns, e.g. biomarkers, via this approach based on class frequency distribution of selected patterns if the targeted resources of genomic sequences, containing “genotypes”, are available and each of these sequences is labeled precisely according to the features, e.g. “phenotypes”, specified by domain experts in the future.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.