Abstract

Advances in next generation sequencing in the last few years have enabled an increasing number of applications in biology and medicine, from whole genome to small-RNA sequencing, with increased throughput accompanied by plunging costs. This thesis is focalized on two of the most used applications, small-RNA sequencing, to investigate the biological function of the increasing population of small non coding RNA, including micro-RNA and Exome sequencing to identify single nucleotide variations (SNV) and small insertion and deletions (InDel). In this context two different dataset were used: the first obtained from small-RNA-sequencing using human breast cancer MCF-7 cells in two different conditions and the latter obtained from exome sequencing in patients with a rare syndrome (malignant migrating partial seizures of infancy). A large amount of data were produce from each experiment, required comprehensive analysis pipelines to analyze them. Small-RNA sequencing represents a novel technology widely used to investigate with high sensitivity and specificity small non-coding RNA populations, comprising microRNAs and other regulatory transcripts. To gather biologically relevant information, such as detection and differential expression analysis of known and novel non-coding RNAs and target prediction, the analysis requires the implementation of multiple statistical and bioinformatics tools from different sources, each focusing on a single step of the analysis pipeline. As result, a novel modular pipeline called iMir for comprehensive analysis of miRNA-Seq data, from adapter trimming, quality filter to differential expression and biological target prediction together with other useful options, was designed by integrating multiple open source modules and resources in an automated workflow. The pipeline was applied to analyze simultaneously miRNA-Seq datasets from human breast cancer MCF-7 cell, resulting in a rapid and accurate identification, quantization and differential expression analysis of ~450 miRNAs, including several novel miRNAs and isomiRs, as well as identification of the putative mRNA targets of differentially expressed miRNAs. Exome sequencing, the targeted sequencing of coding regions of the genome, is a powerful and cost-effective technique for dissecting the genetic basis of diseases and traits that have proved to be intractable with conventional gene-discovery strategies. To reduce the number of false positive variations and simplify the understanding of results, a comprehensive pipeline was developed, integrating different tools. Starting from quality check and alignment, base quality score recalibration and local realignment around indels were performed and SNV and InDel were called. Finally, different filters were applied to discard variations with low quality and coverage. The pipeline was then used to analyze data from exome sequencing in six patients with malignant migrating partial seizures in infancy, also known as MMPSI or MMPEI. After analysis and filtering, common variants between 6, 5, 4 and 3 patients were studied to identify putative disease causing mutation(s). Results obtained indicate the accuracy of the pipeline to identify SNV and short InDels and the reliability to provide a global and quantitative catalogue of nucleotide variants in the exome.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call