Abstract
Analysis of intra- and inter-population diversity has become important for defining the genetic status and distribution patterns of a species and a powerful tool for conservation programs, as high levels of inbreeding could lead into whole population extinction in few generations. Microsatellites (SSR) are commonly used in population studies but discovering highly variable regions across species’ genomes requires demanding computation and laboratorial optimization. In this work, we combine next generation sequencing (NGS) with automatic computing to develop a genomic-oriented tool for characterizing SSRs at the population level. Herein, we describe a new Python pipeline, named Micro-Primers, designed to identify, and design PCR primers for amplification of SSR loci from a multi-individual microsatellite library. By combining commonly used programs for data cleaning and microsatellite mining, this pipeline easily generates, from a fastq file produced by high-throughput sequencing, standard information about the selected microsatellite loci, including the number of alleles in the population subset, and the melting temperature and respective PCR product of each primer set. Additionally, potential polymorphic loci can be identified based on the allele ranges observed in the population, to easily guide the selection of optimal markers for the species. Experimental results show that Micro-Primers significantly reduces processing time in comparison to manual analysis while keeping the same quality of the results. The elapsed times at each step can be longer depending on the number of sequences to analyze and, if not assisted, the selection of polymorphic loci from multiple individuals can represent a major bottleneck in population studies.
Highlights
At the Omics’ era, the cost of sequencing and time required for getting useful information from different organisms, even uncultured, has been drastically reduced with the advances in technology[1], which allowed the broadening of its scientific application worldwide
The analysis was reproduced using the same data in three pipelines from Table 1 capable of finding polymorphisms in the population dataset (MiMi, SSREnricher and GMATA)
The execution of Micro-Primers pipeline produces a single output file in plain text with useful information for the amplification of the simple sequence repeats (SSRs) loci based on its representative sequence
Summary
At the Omics’ era, the cost of sequencing and time required for getting useful information from different organisms, even uncultured, has been drastically reduced with the advances in technology[1], which allowed the broadening of its scientific application worldwide. These tools usually require either (1) a reference genome, what implies that they can be used only when the species of the study is well known or the analysis will need a previous hard work to get at least a decent draft of the species genome, or (2) they work with pre-processed long sequences (contigs) from individual sample libraries, thereby preventing the detection of highly polymorphic SSR loci. They only consider non-enriched libraries what limitates their use in the recovery of polymorphic SSRs for individual
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.