Abstract
<p indent="0mm">Methylation is an important epigenetic modification that plays a key role in regulating gene expression and the occurrence and development of cancers. Accurately identifying DNA/RNA methylation modified sites is the basis for studying the biological functions of methylation. The rapid development of high-throughput sequencing technology has led to the accumulation of DNA/RNA sequence data. Thus, machine learning has become an important method of predicting methylation sites. Feature-encoding algorithms of DNA/RNA sequences extract and encode sequence information into numerical features with strong categorical information for building a machine learning model to predict methylation sites. Therefore, the feature-encoding algorithms of DNA/RNA sequences become the key factor for training a good-performing machine learning model. This study systematically surveyed the 40 feature-encoding algorithms commonly used in the available literatures of the DNA/RNA methylation site prediction models and grouped them into seven categories based on the principles used in calculation. These 40 feature-encoding algorithms were investigated and compared on the benchmark and independent datasets of RNA m<sup>6</sup>A modification in three species, including <italic>S</italic>.<italic> cerevisiae</italic>, <italic>H</italic>.<italic> sapiens</italic>, and Mouse, and on the DNA 4mC modification dataset of <italic>A</italic>.<italic> thaliana</italic>. Finally, the future development of DNA/RNA sequence feature-encoding algorithms is proposed, as well as machine learning models for predicting biological sites.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.