Classifying promoters by interpreting the hidden information of DNA sequences for disease prediction in clinical laboratories using Gaussian decision boundary estimation

Pradeepa S,Ahmed Alkhayyat,Vimal Shanmuganathan,Subbulakshmi P,Niveda Gaspar,Kaliappan M

doi:10.3233/idt-230283

Abstract

A promoter is a brief stretch of DNA (100–1,000 bp) where RNA polymerase starts to transcribe a gene. A DNA (Deoxyribonucleic Acid) base pair is a fundamental unit of DNA structure and represents the pairing of two complementary nucleotide bases within the DNA double helix. The four DNA nucleotide bases are adenine (A), thymine (T), cytosine (C), and guanine (G). DNA base pairs are the building blocks of the DNA molecule, and their complementary pairing is central to the storage and transmission of genetic information in all living organisms. Normally, a promoter is found at the 5′ end of the transcription initiation site or immediately upstream. Numerous human disorders, particularly diabetes, cancer, and Huntington’s disease, have been shown to have DNA promoter as their root cause. The scientific community has long been interested in learning crucial information about protein-coding genes. Finding the promoters is therefore the first step in finding genes in DNA sequences. The scientific world has always been attracted by the effort to glean crucial knowledge about protein-coding genes. Consequently, identifying promoters has emerged as an intriguing challenge that has caught the interest of numerous researchers in the field of bioinformatics. We proposed Gaussian Decision Boundary Estimation in machine learning models to detect transcription start sites (promoters) in the DNA sequences of a common bacteria, Escherichia coli. The best features are identified through a score-based function to select relevant nucleotides that are directly responsible for promoter recognition, in order maximise the models’ performance. The Gaussian Decision Boundary Estimation based support-vector-machine model is trained with these features and finds the best hyperplane that separates the data into different classes. Throughout this study, promoter regions could be identified with high accuracy 99.9% which is better compared to other state of art algorithms. The comparison of machine learning classification models is another major emphasis of this paper in order to identify the model that most accurately predicts DNA sequence promoters. It provides analysis for further biological research as well as precision medicine.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Classifying promoters by interpreting the hidden information of DNA sequences for disease prediction in clinical laboratories using Gaussian decision boundary estimation

Abstract

Talk to us

Similar Papers

More From: Intelligent Decision Technologies

Lead the way for us

Similar Papers

On the reaction of mutagenic aflatoxin B 1 oxide and benz[ a]pyrene diol oxide with guanine residue in DNA double helix
Toshiya Okajima ... Akane Hashikawa
Journal of Molecular Structure: THEOCHEM | VOL. 581
Toshiya Okajima, et. al.Toshiya Okajima ... Akane Hashikawa
20 Nov 2001
Journal of Molecular Structure: THEOCHEM | VOL. 581

Development and Role of the Human Reference Sequence in Personal Genomics
Todd M Smith ... Sandra G Porter
-
Todd M Smith, et. al.Todd M Smith ... Sandra G Porter
16 Jun 2014
16 Jun 2014

In situ DNA synthesis on glass substrate for microarray fabrication using self-focusing acoustic transducer
Jae Wan Kwon ... Eun Sok Kim
IEEE Transactions on Automation Science and Engineering | VOL. 3
Jae Wan Kwon, et. al. Jae Wan Kwon ... Eun Sok Kim
01 Apr 2006
IEEE Transactions on Automation Science and Engineering | VOL. 3

<title>DNA sequence similarity search through content-based retrieval technique</title>
Chia Hung Yeh ... Po Yi Sung
-
Chia Hung Yeh, et. al.Chia Hung Yeh ... Po Yi Sung
27 Aug 2003
27 Aug 2003

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Classifying promoters by interpreting the hidden information of DNA sequences for disease prediction in clinical laboratories using Gaussian decision boundary estimation

Abstract

Talk to us

Similar Papers

More From: Intelligent Decision Technologies