Abstract

We propose to evaluate genome similarity by combining discrete non-decimated wavelet transform (NDWT) and elastic net. The wavelets represent a signal with levels of detail, that is, hidden components are detected by means of the decomposition of this signal, where each level provides a different characteristic. The main feature of the elastic net is the grouping of correlated variables where the number of predictors is greater than the number of observations. The combination of these two methodologies applied in the clustering analysis of the Mycobacterium tuberculosis genome strains proved very effective, being able to identify clusters at each level of decomposition.

Highlights

  • Mycobacterium tuberculosis (MTB), called Koch bacillus, is a species of the pathogenic bacterium of the genus Mycobacterium and the causative agent of most cases of tuberculosis (TB) (Taylor et al, 2003)

  • Sáfadi (2017) showed that the wavelet-domain elastic net methodology was effective for clustering of time series data, that is, the interaction of wavelets with elastic net is an efficient method of grouping

  • The discrete non-decimated wavelet transform was applied to GC-content sequences; the detailed level coefficients are used to study similarities of MTB genome strains through elastic net methodology

Read more

Summary

Introduction

Mycobacterium tuberculosis (MTB), called Koch bacillus, is a species of the pathogenic bacterium of the genus Mycobacterium and the causative agent of most cases of tuberculosis (TB) (Taylor et al, 2003). Sáfadi (2017) showed that the wavelet-domain elastic net methodology was effective for clustering of time series data, that is, the interaction of wavelets with elastic net is an efficient method of grouping. Another characteristic of the method is the speed with which the analyses are processed. Cho et al (2009) proposed a simple stepwise procedure that identifies disease-causing SNPs simultaneously by employing elastic net regularization, a variable selection method that allows addressing multicollinearity in the study of rheumatoid arthritis, showing the efficiency of genetic data interaction with elastic net. The discrete non-decimated wavelet transform was applied to GC-content sequences; the detailed level coefficients are used to study similarities of MTB genome strains through elastic net methodology. The proposed methodology was applied to ten MTB sequences, with two being 2 drug-resistant, 6 six drug-susceptible, one multi drug-resistant and one extensively drug-resistant

Material and Methods
Results and Discussion
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call