Abstract
BackgroundThe availability of sequence data of human pathogenic fungi generates opportunities to develop Bioinformatics tools and resources for vaccine development towards benefitting at-risk patients.DescriptionWe have developed a fungal adhesin predictor and an immunoinformatics database with predicted adhesins. Based on literature search and domain analysis, we prepared a positive dataset comprising adhesin protein sequences from human fungal pathogens Candida albicans, Candida glabrata, Aspergillus fumigatus, Coccidioides immitis, Coccidioides posadasii, Histoplasma capsulatum, Blastomyces dermatitidis, Pneumocystis carinii, Pneumocystis jirovecii and Paracoccidioides brasiliensis. The negative dataset consisted of proteins with high probability to function intracellularly. We have used 3945 compositional properties including frequencies of mono, doublet, triplet, and multiplets of amino acids and hydrophobic properties as input features of protein sequences to Support Vector Machine. Best classifiers were identified through an exhaustive search of 588 parameters and meeting the criteria of best Mathews Correlation Coefficient and lowest coefficient of variation among the 3 fold cross validation datasets. The "FungalRV adhesin predictor" was built on three models whose average Mathews Correlation Coefficient was in the range 0.89-0.90 and its coefficient of variation across three fold cross validation datasets in the range 1.2% - 2.74% at threshold score of 0. We obtained an overall MCC value of 0.8702 considering all 8 pathogens, namely, C. albicans, C. glabrata, A. fumigatus, B. dermatitidis, C. immitis, C. posadasii, H. capsulatum and P. brasiliensis thus showing high sensitivity and specificity at a threshold of 0.511. In case of P. brasiliensis the algorithm achieved a sensitivity of 66.67%. A total of 307 fungal adhesins and adhesin like proteins were predicted from the entire proteomes of eight human pathogenic fungal species. The immunoinformatics analysis data on these proteins were organized for easy user interface analysis. A Web interface was developed for analysis by users. The predicted adhesin sequences were processed through 18 immunoinformatics algorithms and these data have been organized into MySQL backend. A user friendly interface has been developed for experimental researchers for retrieving information from the database.ConclusionFungalRV webserver facilitating the discovery process for novel human pathogenic fungal adhesin vaccine has been developed.
Highlights
The availability of sequence data of human pathogenic fungi generates opportunities to develop Bioinformatics tools and resources for vaccine development towards benefitting at-risk patients.Description: We have developed a fungal adhesin predictor and an immunoinformatics database with predicted adhesins
Certain non-life-threatening superficial and respiratory infections caused by dimorphic pathogenic fungi like C. immitis, H. capsulatum, P. brasiliensis and B. dermatitidis impose significant restrictions on patients, resulting in a reduced quality of life
Predicts the subcellular location of eukaryotic proteins based on the predicted presence of any [64] of the N-terminal presequences: chloroplast transit peptide, mitochondrial targeting peptide or secretory pathway signal peptide (SP)
Summary
As cases of immunosuppression rise, the spectrum of fungal pathogens is increasing posing a serious threat to human health. We present an algorithm developed by using Support Vector Machine trained through a combination of 3945 compositional properties for classifying human pathogenic fungal adhesins and adhesin like proteins. After removing the sequences corresponding to the human fungal pathogens we obtained 74 sequences from Pichia spp, Debaryomyces spp, Saccharomyces spp, Lachancea spp, Schizosaccharomyces spp, Kluyveromyces spp, Zygosaccharomyces spp, Neosartorya spp, Talaromyces spp, Botryotinia spp, Nectria spp, Metarhizium spp, Verticillium spp, Emericella spp, Vanderwaltozyma spp, Beauveria spp, Trichoderma spp, and Magnaporthe spp In this case, a different combination of models of high MCC and low coefficient of variation appear appropriate in identifying 61 of 74 adhesins and giving a high sensitivity of 82.43%. Best model(classifier) Kernel Type Parameters Performance of best model (MCC) Mean MCC for parameters CV for parameters selected in the selected subset accoss three subsets accross three subsets
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have