Abstract

Motivation: Pentatricopeptide repeat (PPR) is a triangular pentapeptide repeat domain that plays a vital role in plant growth. In this study, we seek to identify PPR coding genes and proteins using a mixture of feature extraction methods. We use four single feature extraction methods focusing on the sequence, physical, and chemical properties as well as the amino acid composition, and mix the features. The Max-Relevant-Max-Distance (MRMD) technique is applied to reduce the feature dimension. Classification uses the random forest, J48, and naïve Bayes with 10-fold cross-validation.Results: Combining two of the feature extraction methods with the random forest classifier produces the highest area under the curve of 0.9848. Using MRMD to reduce the dimension improves this metric for J48 and naïve Bayes, but has little effect on the random forest results.Availability and Implementation: The webserver is available at: http://server.malab.cn/MixedPPR/index.jsp.

Highlights

  • Pentatricopeptide repeat (PPR) proteins include tandem repeats of degenerate 35-amino-acid motifs (PPR motifs) (Chen et al, 2018; Rojas et al, 2018)

  • We discuss the performance of each individual feature extraction method

  • The four feature extraction methods focus on different aspects. 188D considers information about the sequence composition and amino acid properties, whereas kmer considers the frequency of amino acid fragments in the sequence

Read more

Summary

Introduction

Pentatricopeptide repeat (PPR) proteins include tandem repeats of degenerate 35-amino-acid motifs (PPR motifs) (Chen et al, 2018; Rojas et al, 2018) They form a class of nuclear-encoded proteins arranged in series by multiple repeating units (Li and Jiang, 2018). Chou et al (Chou, 2010) proposed a method based on the pseudo amino acid composition (Pse-AAC) that has since been used to predict various protein attributes, such as structural class (Sahu and Panda, 2010; Zhu et al, 2018), subcellular location (Wang et al, 2008b; Yang et al, 2016), essential protein (Sarangi et al, 2013), protein secondary structural content

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call