Abstract
BackgroundGenetic studies in tetraploids are lagging behind in comparison with studies of diploids as the complex genetics of tetraploids require much more elaborated computational methodologies. Recent advancements in development of molecular techniques and computational tools facilitate new methods for automated, high-throughput genotype calling in tetraploid species. We report on the upgrade of the widely-used fitTetra software aiming to improve its accuracy, which to date is hampered by technical artefacts in the data.ResultsOur upgrade of the fitTetra package is designed for a more accurate modelling of complex collections of samples. The package fits a mixture model where some parameters of the model are estimated separately for each sub-collection. When a full-sib family is analyzed, we use parental genotypes to predict the expected segregation in terms of allele dosages in the offspring. More accurate modelling and use of parental data increases the accuracy of dosage calling. We tested the package on data obtained with an Affymetrix Axiom 60 k array and compared its performance with the original version and the recently published ClusterCall tool, showing that at least 20% more SNPs could be called with our updated.ConclusionOur updated software package shows clearly improved performance in genotype calling accuracy. Estimation of mixing proportions of the underlying dosage distributions is separated for full-sib families (where mixture proportions can be estimated from the parental dosages and inheritance model) and unstructured populations (where they are based on the assumption of Hardy-Weinberg equilibrium). Additionally, as the distributions of signal ratios of the dosage classes can be assumed to be the same for all populations, including parental data for some subpopulations helps to improve fitting other populations as well. The R package fitTetra 2.0 is freely available under the GNU Public License as Additional file with this article.
Highlights
Genetic studies in tetraploids are lagging behind in comparison with studies of diploids as the complex genetics of tetraploids require much more elaborated computational methodologies
SNP arrays like Illumina Infinium [6] or Affymetrix Axiom [7] measure the fluorescence signals of two dyes generated by the two alleles of each SNP
We compared the performance of fitTetra 2.0 to fitTetra 1.0 and to the recently published ClusterCall [4] software based on two large datasets
Summary
Genetic studies in tetraploids are lagging behind in comparison with studies of diploids as the complex genetics of tetraploids require much more elaborated computational methodologies. Genetic studies in tetraploid species are lagging behind those of diploids This is mostly due to the more complex genetics of tetraploids. Alleles at a marker locus can be present in five different dosages: 0 (nulliplex), 1 (simplex), 2 (duplex), 3 (triplex) and 4 (quadruplex). Dosage scoring of such markers is challenging and has been computationally approached only since 2011 [1,2,3,4]. The signal intensities should fall into one of the five possible categories In practice these values are continuous and need to be converted into discrete dosage categories. Automated dosage scoring in tetraploid has been approached with k-means clustering [1] and
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.