Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search

Herman De Beukelaer,Petr Smýkal,Guy F Davenport,Veerle Fack

doi:10.1186/1471-2105-13-312

Herman De Beukelaer, Petr Smýkal + Show 2 more

Open Access

https://doi.org/10.1186/1471-2105-13-312

Copy DOI

Abstract

BackgroundSampling core subsets from genetic resources while maintaining as much as possible the genetic diversity of the original collection is an important but computationally complex task for gene bank managers. The Core Hunter computer program was developed as a tool to generate such subsets based on multiple genetic measures, including both distance measures and allelic diversity indices. At first we investigate the effect of minimum (instead of the default mean) distance measures on the performance of Core Hunter. Secondly, we try to gain more insight into the performance of the original Core Hunter search algorithm through comparison with several other heuristics working with several realistic datasets of varying size and allelic composition. Finally, we propose a new algorithm (Mixed Replica search) for Core Hunter II with the aim of improving the diversity of the constructed core sets and their corresponding generation times.ResultsOur results show that the introduction of minimum distance measures leads to core sets in which all accessions are sufficiently distant from each other, which was not always obtained when optimizing mean distance alone. Comparison of the original Core Hunter algorithm, Replica Exchange Monte Carlo (REMC), with simpler heuristics shows that the simpler algorithms often give very good results but with lower runtimes than REMC. However, the performance of the simpler algorithms is slightly worse than REMC under lower sampling intensities and some heuristics clearly struggle with minimum distance measures. In comparison the new advanced Mixed Replica search algorithm (MixRep), which uses heterogeneous replicas, was able to sample core sets with equal or higher diversity scores than REMC and the simpler heuristics, often using less computation time than REMC.ConclusionThe REMC search algorithm used in the original Core Hunter computer program performs well, sometimes leading to slightly better results than some of the simpler methods, although it doesn’t always give the best results. By switching to the new Mixed Replica algorithm overall results and runtimes can be significantly improved. Finally we recommend including minimum distance measures in the objective function when looking for core sets in which all accessions are sufficiently distant from each other. Core Hunter II is freely available as an open source project at http://www.corehunter.org.

Highlights

Sampling core subsets from genetic resources while maintaining as much as possible the genetic diversity of the original collection is an important but computationally complex task for gene bank managers
For the large pea dataset LR requires much more time than the runtime limit applied to Replica Exchange Monte Carlo (REMC). Because of this big difference in runtimes for the large pea dataset we experimented with applying higher runtime limits to REMC, but even when going up to a limit of 2 hours instead of 10 minutes results of REMC almost do not improve compared to the results shown in Table 3, and REMC still does not succeed in sampling cores with nonzero minimum distance
We conclude that when aiming at high minimum distances the results of Mixed Replica search (MixRep) are very similar to those of the LR method and often significantly better than those of all other methods ( REMC, and MSTRAT and Local Search as shown before)

Summary

Introduction

Sampling core subsets from genetic resources while maintaining as much as possible the genetic diversity of the original collection is an important but computationally complex task for gene bank managers. The Core Hunter computer program was developed as a tool to generate such subsets based on multiple genetic measures, including both distance measures and allelic diversity indices. To be able to generate diverse core sets we need evaluation measures that express the diversity of a given collection These measures are based on a variety of criteria including phenotypic traits or genetic marker data [2,3,4,5,6,7], or a combination of both [8,9]. Many algorithms for core set selection have been proposed, including stratified sampling techniques These stratified sampling strategies first perform a clustering of the entire collection and sample accessions from each cluster, based on some allocation method. Several allocation methods have been proposed including the P-, L- and D-methods

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Nov 23, 2012
Citations: 68	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

SU-E-J-113: A New Quantification Measure of the Difference Between Two Organ Contours.
H Kim ... Sb Park
Medical physics | VOL. 39
H Kim, et. al.H Kim ... Sb Park
01 Jun 2012
Medical physics | VOL. 39

An Evolutionary Profile Guided Greedy Parallel Replica-Exchange Monte Carlo Search Algorithm for Rapid Convergence in Protein Design.
Anupam Banerjee ... Kuntal Pal
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 18
Anupam Banerjee, et. al.Anupam Banerjee ... Kuntal Pal
19 Jul 2019
IEEE/ACM transactions on computational biology and bioinformatics | VOL. 18

Volumetric local directional triplet patterns for biomedical image retrieval
Anil B Gonde ... Gajanan M Galshetwar
-
Anil B Gonde, et. al.Anil B Gonde ... Gajanan M Galshetwar
01 Dec 2017
01 Dec 2017

Relation Between Species Assemblages Of Fishes and Water Quality In Salt Ponds and Sloughs In South San Francisco Bay
Francine Mejia ... John Y Takekawa
The Southwestern Naturalist | VOL. 53
Francine Mejia, et. al.Francine Mejia ... John Y Takekawa
01 Jan 2008
The Southwestern Naturalist | VOL. 53

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Core Hunter II: fast core subset selection based on multiple genetic diversity measures using Mixed Replica search

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics