Abstract

Identification of regulatory motifs in DNA sequences can be seen as a prerequisite for understanding many biological processes including gene transcription and regulation. Regulatory motifs are usually located in gene promoters, which are in turn located in intergenic regions. Thus, the analysis of the intergenic regions can help identify regulatory motifs. However, simply searching for known motifs can result in many hits, which are not necessarily active with respect to transcription regulation. In this paper, we explore a motif-based machine learning approach to identify active intergenic regulatory elements. More precisely, we use machine learning algorithms to learn models that can predict the direction of transcription for pairs of consecutive genes in Arabidopsis thaliana using motifs from AthaMap and PLACE. Under the assumption that predictive motifs correspond to active regulatory elements, we identify active motifs by performing feature selection and feature abstraction. Experimental results show that indeed feature selection and feature abstraction methods are two important means that contribute to good performance for the prediction problem considered in this work [3].

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.