Automating microsatellite screening and primer design from multi-individual libraries using Micro-Primers

Filipe Alves,Filipa M S Martins,Miguel Areias,Antonio Muñoz-Mérida

doi:10.1038/s41598-021-04275-8

Filipe Alves, Filipa M S Martins + Show 2 more

Open Access

https://doi.org/10.1038/s41598-021-04275-8

Copy DOI

Journal: Scientific Reports	Publication Date: Jan 7, 2022
Citations: 2	License type: open-access

Affiliation: University of Porto

Abstract

Analysis of intra- and inter-population diversity has become important for defining the genetic status and distribution patterns of a species and a powerful tool for conservation programs, as high levels of inbreeding could lead into whole population extinction in few generations. Microsatellites (SSR) are commonly used in population studies but discovering highly variable regions across species’ genomes requires demanding computation and laboratorial optimization. In this work, we combine next generation sequencing (NGS) with automatic computing to develop a genomic-oriented tool for characterizing SSRs at the population level. Herein, we describe a new Python pipeline, named Micro-Primers, designed to identify, and design PCR primers for amplification of SSR loci from a multi-individual microsatellite library. By combining commonly used programs for data cleaning and microsatellite mining, this pipeline easily generates, from a fastq file produced by high-throughput sequencing, standard information about the selected microsatellite loci, including the number of alleles in the population subset, and the melting temperature and respective PCR product of each primer set. Additionally, potential polymorphic loci can be identified based on the allele ranges observed in the population, to easily guide the selection of optimal markers for the species. Experimental results show that Micro-Primers significantly reduces processing time in comparison to manual analysis while keeping the same quality of the results. The elapsed times at each step can be longer depending on the number of sequences to analyze and, if not assisted, the selection of polymorphic loci from multiple individuals can represent a major bottleneck in population studies.

Highlights

At the Omics’ era, the cost of sequencing and time required for getting useful information from different organisms, even uncultured, has been drastically reduced with the advances in technology[1], which allowed the broadening of its scientific application worldwide
The analysis was reproduced using the same data in three pipelines from Table 1 capable of finding polymorphisms in the population dataset (MiMi, SSREnricher and GMATA)
The execution of Micro-Primers pipeline produces a single output file in plain text with useful information for the amplification of the simple sequence repeats (SSRs) loci based on its representative sequence

Summary

Introduction

At the Omics’ era, the cost of sequencing and time required for getting useful information from different organisms, even uncultured, has been drastically reduced with the advances in technology[1], which allowed the broadening of its scientific application worldwide. These tools usually require either (1) a reference genome, what implies that they can be used only when the species of the study is well known or the analysis will need a previous hard work to get at least a decent draft of the species genome, or (2) they work with pre-processed long sequences (contigs) from individual sample libraries, thereby preventing the detection of highly polymorphic SSR loci. They only consider non-enriched libraries what limitates their use in the recovery of polymorphic SSRs for individual

Objectives

Methods

Results

Discussion

Conclusion