Abstract
Abstract Motivation Genomic Signal Processing, which transforms biomolecular sequences into discrete signals for spectral analysis, has provided valuable insights into DNA sequence, structure, and evolution. However, challenges persist with spectral representations of variable-length sequences for tasks like species classification and in interpreting these spectra to identify discriminative DNA regions. Results We introduce SpecGMM, a novel framework that integrates sliding window-based Spectral analysis with a Gaussian Mixture Model to transform variable-length DNA sequences into fixed-dimensional spectral representations for taxonomic classification. SpecGMM’s hyperparameters were selected using a dataset of plant sequences, and applied unchanged across diverse datasets, including mitochondrial DNA, viral and bacterial genome, and 16S rRNA sequences. Across these datasets, SpecGMM outperformed a baseline method, with 9.45% average and 35.55% maximum improvement in test accuracies for a Linear Discriminant classifier. Regarding interpretability, SpecGMM revealed discriminative hypervariable regions in 16S rRNA sequences–particularly V3/V4 for discriminating higher taxa and V2/V3 for lower taxa–corroborating their known classification relevance. SpecGMM’s spectrogram video analysis helped visualize species–specific DNA signatures. SpecGMM thus provides a robust and interpretable method for spectral DNA analysis, opening new avenues in genomic signal processing research. Availability and implementation SpecGMM’s source code is available at https://github.com/BIRDSgroup/SpecGMM.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.