Protein sequence classification plays a pivotal role in bioinformatics as it enables the comprehension of protein functions and their involvement in diverse biological processes. While numerous machine learning models have been proposed to tackle this challenge, traditional approaches face limitations in capturing the intricate relationships and hierarchical structures inherent in genomic sequences. These limitations stem from operating within high-dimensional non-Euclidean spaces. To address this issue, we introduce the application of the elliptic geometry-based approach for protein sequence classification. First, we transform the problem in elliptic geometry and integrate it with the Gaussian kernel to map the problem into the Mercer kernel. The Gaussian-Elliptic approach allows for the implicit mapping of data into a higher-dimensional feature space, enabling the capture of complex nonlinear relationships. This feature becomes particularly advantageous when dealing with hierarchical or tree-like structures commonly encountered in biological sequences. Experimental results highlight the effectiveness of the proposed model in protein sequence classification, showcasing the advantages of utilizing elliptic geometry in bioinformatics analyses. It outperforms state-of-the-art methods by achieving 76% and 84% accuracies for DNA and Protein datasets, respectively. Furthermore, we provide theoretical justifications for the proposed model. This study contributes to the burgeoning field of geometric deep learning, offering insights into the potential applications of elliptic representations in the analysis of biological data.
Read full abstract