Abstract
BackgroundIn the biological experiments of soybean species, molecular markers are widely used to verify the soybean genome or construct its genetic map. Among a variety of molecular markers, insertions and deletions (InDels) are preferred with the advantages of wide distribution and high density at the whole-genome level. Hence, the problem of detecting InDels based on next-generation sequencing data is of great importance for the design of InDel markers. To tackle it, this paper integrated machine learning techniques with existing software and developed two algorithms for InDel detection, one is the best F-score method (BF-M) and the other is the Support Vector Machine (SVM) method (SVM-M), which is based on the classical SVM model.ResultsThe experimental results show that the performance of BF-M was promising as indicated by the high precision and recall scores, whereas SVM-M yielded the best performance in terms of recall and F-score. Moreover, based on the InDel markers detected by SVM-M from soybeans that were collected from 56 different regions, highly polymorphic loci were selected to construct an InDel marker database for soybean.ConclusionsCompared to existing software tools, the two algorithms proposed in this work produced substantially higher precision and recall scores, and remained stable in various types of genomic regions. Moreover, based on SVM-M, we have constructed a database for soybean InDel markers and published it for academic research.
Highlights
In the biological experiments of soybean species, molecular markers are widely used to verify the soybean genome or construct its genetic map
To detect insertions and deletions (InDels) markers in a more accurate and comprehensive manner by using the strategy mentioned above, we propose two InDel detection methods: best F-score method (BF-M) algorithm, which is based on the optimal F-score that considers both precision and recall to measure the accuracy, and Support Vector Machine (SVM)-M algorithm, which is designed according to SVM
Experiment setup To evaluate the performances of BF-M and SVM-M, we compared them against five software tools, including Samtools, GATK UnifiedGenotyper (GATK-UG), PIndel, SOAPindel and Varscan
Summary
In the biological experiments of soybean species, molecular markers are widely used to verify the soybean genome or construct its genetic map. Among a variety of molecular markers, insertions and deletions (InDels) are preferred with the advantages of wide distribution and high density at the whole-genome level. The development of molecular markers has undergone various stages, including restriction fragment length polymorphism (RFLP), single-strand conformation polymorphism (SSCP), random amplified polymorphism detection (RAPD), amplified fragment length polymorphism (AFLP), short simple tandem repeats (SSR) [2], single nucleotide polymorphisms (SNPs) [3, 4], and short insertions and deletions (InDels) [1]. The main difference among these software tools lies in the models they use to identify InDel markers. Samtools and GATK-UG investigate the results of alignment between sequencing data and the reference genome, and employ different Bayesian statistical models to calculate
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.