Nucleotide-based molecules called DNA and RNA are essential for several biological processes that affect both normal and cancerous cells. They contain the critical genetic material needed for normal cell growth and functioning. The DNA structure patterns that make up the genetic code affect cells' growth, behavior, and control. Different DNA structure patterns indicate different physiological effects in the cell. Knowledge of these patterns is necessary to identify the molecular origins of cancer and other disorders. Analyzing these patterns can help in the early detection of diseases, which is essential for the effectiveness of cancer research and therapy. The novelty of this study is to examine the patterns of dinucleotide structure in many genomic regions, including the non-coding region sequence (N-CDS), coding region sequence (CDS), and whole raw DNA sequence (W.R. sequence). It provides an in-depth discussion of dinucleotide patterns related to these diverse genetic environments and contains malignant and non-malignant DNA sequences. The Markovian modeling that predicts dinucleotide probabilities also reduces feature complexity and minimizes computational costs compared to the approaches of Kernelized Logistic Regression (KLR) and Support Vector Machine (SVM). This technique is effectively evaluated in essential case studies, as indicated by accuracy metrics and 10-fold cross-validation. The classifier and feature reduction, which are generated by Markovian probability, operate well together and can help predict cancer. Our findings successfully distinguish DNA sequences related to cancer from those diagnostics of non-cancerous diseases by analyzing the W.R. DNA sequence as well as its CDS and N-CDS regions.
Read full abstract