Multilocus variable number tandem repeat analysis (MLVA) utilizes short DNA repeat polymorphism in genomes, which is termed variable number tandem repeat (VNTR), to differentiate closely related organisms. One research challenge is to find an optimal set of VNTR to distinguish different members accurately. An intuitive method is to use an exhaustive search method. However, this method is not an efficient way to find optimal solutions from a dataset comprising many attributes (loci) due to the curse of dimensionality. In this study, metaheuristic methods are proposed to find an optimal set of loci combination. Basic genetic algorithm (BGA) and modified genetic algorithm (MGA) were proposed in our previous work for this purpose. However, they require prior knowledge from an experienced user to specify the minimum number of loci for achieving good results. To impose no such expertise requirement for parameter setting, a GA with Duplicates (GAD), which allows the inclusion of duplicated loci in a chromosome (potential solution) during the search process, is developed. The study also investigates the search performance of a hybrid metaheuristic method, namely quantum-inspired differential evolution (QDE). Hunter-Gaston Discriminatory Index (HGDI) is used to indicate the discriminatory power of a loci combination. Two Mycobacterium tuberculosis MLVA datasets obtained from a public portal and a local laboratory respectively, are used. The results obtained by using exhaustive search and metaheuristic methods are first compared, followed by a performance comparison among BGA, MGA, GAD, and QDE by a statistical approach. The best-performing GA method (i.e., GAD) and QDE are selected for a performance comparison with several recent metaheuristic methods using both MLVA datasets by a statistical approach. The statistical results show that both GAD and QDE could achieve higher HGDI than the recent methods using a small but informative set of loci combination.
Read full abstract