Abstract

Type IV secretion systems (T4SS) are multi-protein complexes in a number of bacterial pathogens that can translocate proteins and DNA to the host. Most T4SSs function in conjugation and translocate DNA; however, approximately 13% function to secrete proteins, delivering effector proteins into the cytosol of eukaryotic host cells. Upon entry, these effectors manipulate the host cell’s machinery for their own benefit, which can result in serious illness or death of the host. For this reason recognition of T4SS effectors has become an important subject. Much previous work has focused on verifying effectors experimentally, a costly endeavor in terms of money, time, and effort. Having good predictions for effectors will help to focus experimental validations and decrease testing costs. In recent years, several scoring and machine learning-based methods have been suggested for the purpose of predicting T4SS effector proteins. These methods have used different sets of features for prediction, and their predictions have been inconsistent. In this paper, an optimal set of features is presented for predicting T4SS effector proteins using a statistical approach. A thorough literature search was performed to find features that have been proposed. Feature values were calculated for datasets of known effectors and non-effectors for T4SS-containing pathogens for four genera with a sufficient number of known effectors, Legionella pneumophila, Coxiella burnetii, Brucella spp, and Bartonella spp. The features were ranked, and less important features were filtered out. Correlations between remaining features were removed, and dimensional reduction was accomplished using principal component analysis and factor analysis. Finally, the optimal features for each pathogen were chosen by building logistic regression models and evaluating each model. The results based on evaluation of our logistic regression models confirm the effectiveness of our four optimal sets of features, and based on these an optimal set of features is proposed for all T4SS effector proteins.

Highlights

  • The type IV secretion sytem (T4SS) is a complex made up of proteins which deliver DNA and proteins to the host cell

  • Our goal was to determine an optimal set of features for prediction of all T4SS effector proteins, and as such, we decided to work with various pathogen datasets

  • Afterwards we used principal component analysis (PCA) and factor analysis over the features to reduce their dimensions and to eliminate any correlation and redundancy among them. These steps led to generation of factors that were used in building logistic regression models for the purpose of selecting an informative group of features

Read more

Summary

Introduction

The type IV secretion sytem (T4SS) is a complex made up of proteins which deliver DNA and proteins to the host cell. Proteins secreted by the T4SS are known as effectors and are agents of virulence and pathogenesis. They change the environment of the cell to be more hospitable for the bacterial pathogens allowing replication of the bacteria [3]. Before function can be studied, effectors must be identified, and this is still a major challenge as experimental identification and verification is costly both in terms of time and money. With the advent of machine learning methods, researchers have turned to scoring methods [4] and machine learning algorithms [5,6,7,8] to predict effector proteins from the genomes or proteomes of pathogens. If prediction is known to be highly accurate, the process of experimental verification can be performed much more efficiently

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call