Abstract

Abstract Cancer cells are formed when active genes stop functioning properly. Timely activation of a gene is governed through the combined effort of multiple Transcription Factors (TFs). TFs are proteins that bind on DNA in a sequence-specific manner. It is difficult to trace the target and role of TFs in the gene regulation process. The same element acts differently in different places, similar to the way the same word has a different meaning in a different context. This approach treats the cell line in a language context, whereas the genes and TFs are the symbols or letters of the language. Different combination of symbols forms a sequence with repetitive patterns. Identifying and analysing such frequently occurring patterns will give a better insight into the cell. This work mainly aims to identify such patterns found in the cell line using regular expression technique. The patterns generated in this work can be chosen as a feature for identifying the effect of regulatory elements in the genomic region. For improving readability identity of each character present in the pattern is documented in the form of a text file. Acute Myeloid Leukaemia (AML) data from GEO repository and the related two TFs binding narrow peak data, calibrated in K562 cell line from ENCODE consortium are taken as a case study.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call