Abstract

Extracellular matrix (ECM) proteins play an important role in a series of biological processes of cells. The study of ECM proteins is helpful to further comprehend their biological functions. We propose ECMP-RF (extracellular matrix proteins prediction by random forest) to predict ECM proteins. Firstly, the features of the protein sequence are extracted by combining encoding based on grouped weight, pseudo amino-acid composition, pseudo position-specific scoring matrix, a local descriptor, and an autocorrelation descriptor. Secondly, the synthetic minority oversampling technique (SMOTE) algorithm is employed to process the class imbalance data, and the elastic net (EN) is used to reduce the dimension of the feature vectors. Finally, the random forest (RF) classifier is used to predict the ECM proteins. Leave-one-out cross-validation shows that the balanced accuracy of the training and testing datasets is 97.3% and 97.9%, respectively. Compared with other state-of-the-art methods, ECMP-RF is significantly better than other predictors.

Highlights

  • The extracellular matrix is a macromolecule synthesized by animal cells that is distributed on the cell surface or between cells [1,2]

  • The training dataset was used as the research object to perform feature extraction on protein paper, the training dataset was used as the research object to perform feature extraction on protein sequences

  • Extracellular matrix (ECM) proteins participate in a variety of biological processes, and they play an important role in cell life activities

Read more

Summary

Introduction

The extracellular matrix is a macromolecule synthesized by animal cells that is distributed on the cell surface or between cells [1,2]. The ECM provides structural support to cells within the tumor and provides anchorage and tissue separation for the cells. It has a coherence effect to mediate communication between cells, and it contributes to survival and differentiation signals [3,4]. ECM proteins actively promote essential cellular processes such as differentiation, proliferation, adhesion, migration, and apoptosis [5,6,7,8]. Defects of ECM proteins are associated with many human diseases (such as cancer, atherosclerosis, asthma, fibrosis, and arthritis), and modified ECM proteins help to understand complex pathologies [10,11]. ECM protein research contributes to the development of novel cell-adhesive biomaterials that are critical in many medical fields, such as cell therapy or tissue engineering [12,13]. The diversity of ECM proteins contributes to the diversity of ECM function; Mathematics 2020, 8, 169; doi:10.3390/math8020169 www.mdpi.com/journal/mathematics

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call