Abstract

Over the years of applying machine learning in bioinformatics, we have learned that scientists, working in many areas of life sciences, call for deeper knowledge of the modeled phenomenon than just the information used to classify the objects with a certain quality. As dynamic molecules of gene activities, transcriptome profiling by RNA sequencing (RNA-seq) is becoming increasingly popular, which not only measures gene expression but also structural variations such as mutations and fusion transcripts. Moreover, Single nucleotide polymorphisms (SNPs) are of great potential in genetics, breeding, ecological and evolutionary studies. Rough sets could be successfully employed to tackle various problems such as gene expression clustering and classification. This study provides general guidelines for accurate SNP discovery from RNA-seq data. Those SNPs annotations are used to find relation between their biological features and the differential expression of the genes to which those SNPs belong. Rough sets are utilized to define this kind of relationship into a finite set of rules. Set of (32) generated rules proved good results with strength, certainty and coverage evaluation terms. This strategy is applied to the analysis of SNPs in A. thaliana plant under heat-stress.

Highlights

  • RNA sequencing (RNA-seq) technology has resulted in exceptionally fast and wide scale analysis of the genetic information exists in all organisms

  • This study proposes a promising framework to illustrate how Single nucleotide polymorphisms (SNPs) can be discovered, annotated and, analyzed from RNA-seq data in order to be used to describe genes expression

  • The ultimate goal of this research is to find the relationship between the set of heat-stress expressed genes and their detected SNPs biological features in A. thaliana RNA-seq raw reads

Read more

Summary

Introduction

RNA sequencing (RNA-seq) technology has resulted in exceptionally fast and wide scale analysis of the genetic information exists in all organisms. This mainly includes the concurrent study of alternative splicing, Single nucleotide polymorphisms (SNPs) and differential expression. The approach of genome-guided transcriptome has been the standard RNA-seq analysis method for model organisms like A. thaliana. Some existing software packages are available to perform this task [1]. New tools are continuously developed to be used for RNA-seq analysis task starting from reads alignment ending with the pathway analysis mission. Some non-expert users for those tools cannot get the full power and capabilities of them on wide scale [2]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call