Abstract

We present a linear-time, deterministic algorithm for finding a longest Open Reading Frame (ORF) in an alternatively spliced gene represented by a splice graph. Finding protein-encoding regions is a fundamental problem in genomic and transcriptomic analysis, and in some circumstances long ORFs can provide good predictions of such regions. Splice graphs are a common way of compactly representing what may be exponentially many alternative splicings of a sequence. The efficiency of our algorithm is achieved by pruning the search space so as to bound the number of reading frames considered at any vertex of the splice graph. The algorithm guarantees that the unpruned reading frames contain at least one longest ORF of the gene. We are therefore able to find a longest ORF among all splice variants in time linear in the size of the splice graph, even though the number of potential transcripts may be much larger.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.