Abstract

Cost-efficient next generation sequencers can now produce unprecedented volumes of raw DNA data, posing challenges for annotation. Supervised machine learning approaches have been traditionally used to analyse and annotate complex genomic information. However, such approaches require labelled data for training, which in practice is scarce or expensive, while the unlabelled data is abundant. For some problems, semi-supervised learning can help improve supervised classifiers by making use of large amounts of unlabelled data and the latent information within them. We evaluate the applicability of semi-supervised learning algorithms to the problem of DNA sequence annotation, specifically to the prediction of alternatively spliced exons. We employ Expectation Maximisation, Self-training, and Co-training algorithms in an effort to assess the strengths and limitations of these techniques in the context of alternative splicing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.