Abstract

In the present article we propose the application of variants of the mutual information function as characteristic fingerprints of biomolecular sequences for classification analysis. In particular, we consider the resolved mutual information functions based on Shannon-, Rényi-, and Tsallis-entropy. In combination with interpretable machine learning classifier models based on generalized learning vector quantization, a powerful methodology for sequence classification is achieved which allows substantial knowledge extraction in addition to the high classification ability due to the model-inherent robustness. Any potential (slightly) inferior performance of the used classifier is compensated by the additional knowledge provided by interpretable models. This knowledge may assist the user in the analysis and understanding of the used data and considered task. After theoretical justification of the concepts, we demonstrate the approach for various example data sets covering different areas in biomolecular sequence analysis.

Highlights

  • The accumulation of information based on physical organization processes like structure generation and self-organization belongs to the key aspects of living systems [1,2,3,4].information theoretic concepts play an important role in sequence analysis for understanding biomolecular entities like RNA, DNA, and proteins to explain biological systems [5,6,7]

  • The authors of this study proposed, based on the earlier work in [40], a partial information correlation. This critic together with the above mentioned robustness observations for the Rényi and the Tsallis entropy estimators motivated our investigations: first, we introduce a Entropy 2021, 23, 1357 resolved variant of the Shannon-based mutual information functions (MIFs) as a more adequate information theoretic signature of molecular sequences reducing the average effects

  • We introduce the concept of variants of mutual information functions, which later serve as determining fingerprints of nucleotide sequences

Read more

Summary

Introduction

The accumulation of information based on physical organization processes like structure generation and self-organization belongs to the key aspects of living systems [1,2,3,4]. The authors of this study proposed, based on the earlier work in [40], a partial information correlation This critic together with the above mentioned robustness observations for the Rényi and the Tsallis entropy estimators motivated our investigations: first, we introduce a Entropy 2021, 23, 1357 resolved variant of the Shannon-based MIF as a more adequate information theoretic signature of molecular sequences reducing the average effects. Afterwards, we transfer this concept to both the Rényi and the Tsallis variants obtaining respective (resolved) mutual information functions. Concluding remarks and outlook for future work complete the paper

Variants of Mutual Information Functions as Biomolecular Sequence Signatures
The Resolved Mutual Information Function Based on the Shannon Entropy
Rényi α-Entropy and Related Mutual Information Functions
Tsallis α-Entropy and Related Mutual Information Functions
Interpretable Classification Learning by Learning Vector Quantization
Applications of Mutual Information Functions for Sequence Classification
Quadruplex Detection
COVID Types
Feature Generation
Natural Vectors
Mutual Information Functions
Handling of Ambiguous Characters
Classification
Classification Performance
Visualization of MIF Variants
Interpretation of CCM and CIP of the Trained LiRaM-LVQ Model
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call