Abstract

Abstract Cancer cells are formed when active genes stop functioning properly. Timely activation of a gene is governed through the combined effort of multiple Transcription Factors (TFs). TFs are proteins that bind on DNA in a sequence-specific manner. It is difficult to trace the target and role of TFs in the gene regulation process. The same element acts differently in different places, similar to the way the same word has a different meaning in a different context. This approach treats the cell line in a language context, whereas the genes and TFs are the symbols or letters of the language. Different combination of symbols forms a sequence with repetitive patterns. Identifying and analysing such frequently occurring patterns will give a better insight into the cell. This work mainly aims to identify such patterns found in the cell line using regular expression technique. The patterns generated in this work can be chosen as a feature for identifying the effect of regulatory elements in the genomic region. For improving readability identity of each character present in the pattern is documented in the form of a text file. Acute Myeloid Leukaemia (AML) data from GEO repository and the related two TFs binding narrow peak data, calibrated in K562 cell line from ENCODE consortium are taken as a case study.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.