Automatic Clustering of DNA Sequences With Intelligent Techniques

Yasmin A Badr,Khaled T Wassif,Mahmoud Othman

doi:10.1109/access.2021.3119560

Yasmin A Badr, Khaled T Wassif + Show 1 more

Open Access

https://doi.org/10.1109/access.2021.3119560

Copy DOI

Abstract

With the discovery of new DNAs, a fundamental problem arising is how to categorize those DNA sequences into correct species. Unfortunately, identifying all data groups correctly and assigning a set of DNAs into k clusters where k must be predefined are one of the major drawbacks in clustering analysis, especially when the data have many dimensions and the number of clusters is too large and hard to guess. Furthermore, finding a similarity measure that preserves the functionality and represents both the composition and distribution of the bases in a DNA sequence is one of the main challenges in computational biology. In this paper, a new soft computing metaheuristic framework is introduced for automatic clustering to generate the optimal cluster formation and to determine the best estimate for the number of clusters. Pulse coupled neural network (PCNN) is utilized for the calculation of DNA sequence similarity or dissimilarity. Bat algorithm is hybridized with the well-known genetic algorithm to solve the automatic data clustering problem. Extensive computational experiments are conducted on the expanded human oral microbiome database (eHOMD). A comparative study between the experimental results shows that the proposed hybrid algorithm achieved superior performance over the standard genetic algorithm and bat algorithm. Moreover, the hybrid performance was compared with competing algorithms from the literature review to ascertain its superiority. Mann-Whitney-Wilcoxon rank-sum test is conducted to statistically validate the obtained clusters.

Highlights

The clustering problem is an unsupervised problem, which aims at assigning similar groups together to discover unlabeled similar structures in data without any prior knowledge [1] [2]
We propose a new chromosome design that can identify the optimal number of clusters for variable-length chromosomes without any prior knowledge
It provides information about bacterial species found in the human aerodigestive tract (ADT) including the nasal passages, sinuses, throat, esophagus, mouth, and lower respiratory tract. expanded human oral microbiome database (eHOMD) includes a total of 775 microbial species and more than 1,000 microbial DNAs

Summary

INTRODUCTION

The clustering problem is an unsupervised problem, which aims at assigning similar groups together to discover unlabeled similar structures in data without any prior knowledge [1] [2]. A new method based on pulse coupled neural network introduced by Xin Jin et al [18] is applied to find similarity or dissimilarity of DNA sequences where DNA is transformed into a numeral sequence using four number mapping schemes representing the DNA effectively without losing any genetic information. It processes on DNAs with several sizes taking into consideration the local and global features; it is adopted.

SOFT COMPUTING TECHNIQUES

RELATED WORK

PROPOSED SYSTEM

ENTROPY OF DNA SEQUENCES

CLUSTERING WITH GENETIC ALGORITHM

CLUSTERING WITH BAT ALGORITHM

DATA SET DESCRIPTION

SYSTEM CONFIGURATION AND PARAMETER SETTING

CONCLUSION AND FUTURE WORK

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2021
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Automatic Clustering of DNA Sequences With Intelligent Techniques

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

A novel combinatorial merge-split approach for automatic clustering using imperialist competitive algorithm
Zahra Aliniya ... Seyed Abolghasem Mirroshandel
Expert systems with applications | VOL. 117
Zahra Aliniya, et. al.Zahra Aliniya ... Seyed Abolghasem Mirroshandel
26 Sep 2018
Expert systems with applications | VOL. 117

Automatic clustering using nature-inspired metaheuristics: A survey
Adán José-García ... Wilfrido Gómez-Flores
Applied Soft Computing Journal | VOL. 41
Adán José-García, et. al.Adán José-García ... Wilfrido Gómez-Flores
31 Dec 2015
Applied Soft Computing Journal | VOL. 41

K-Means-Based Nature-Inspired Metaheuristic Algorithms for Automatic Data Clustering Problems: Recent Advances and Future Directions
Abiodun M Ikotun ... Mubarak S Almutari
Applied sciences | VOL. 11
Abiodun M Ikotun, et. al.Abiodun M Ikotun ... Mubarak S Almutari
26 Nov 2021
Applied sciences | VOL. 11

K-Means Hybridization with Enhanced Firefly Algorithm for High-Dimension Automatic Clustering
Afroj Alam ... Muhammad Kalamuddin Ahamad
Journal of Advanced Research in Applied Sciences and Engineering Technology | VOL. 33
Afroj Alam, et. al. Afroj Alam ... Muhammad Kalamuddin Ahamad
09 Nov 2023
Journal of Advanced Research in Applied Sciences and Engineering Technology | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Clustering of DNA Sequences With Intelligent Techniques

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions