Abstract

The knowledge of protein functions plays an essential role in understanding biological cells and has a significant impact on human life in areas such as personalized medicine, better crops and improved therapeutic interventions. Due to expense and inherent difficulty of biological experiments, intelligent methods are generally relied upon for automatic assignment of functions to proteins. The technological advancements in the field of biology are improving our understanding of biological processes and are regularly resulting in new features and characteristics that better describe the role of proteins. It is inevitable to neglect and overlook these anticipated features in designing more effective classification techniques. A key issue in this context, that is not being sufficiently addressed, is how to build effective classification models and approaches for protein function prediction by incorporating and taking advantage from the ever evolving biological information. In this article, we propose a three-way decision making approach which provides provisions for seeking and incorporating future information. We considered probabilistic rough sets based models such as Game-Theoretic Rough Sets (GTRS) and Information-Theoretic Rough Sets (ITRS) for inducing three-way decisions. An architecture of protein functions classification with probabilistic rough sets based three-way decisions is proposed and explained. Experiments are carried out on Saccharomyces cerevisiae species dataset obtained from Uniprot database with the corresponding functional classes extracted from the Gene Ontology (GO) database. The results indicate that as the level of biological information increases, the number of deferred cases are reduced while maintaining similar level of accuracy.

Highlights

  • All living organisms are composed of cells, which are intricately arranged chemical factories that obtain matter from their environment and use this raw matter to generate copies of themselves [1]

  • We focus on three-way decisions with probabilistic rough sets

  • To demonstrate the use of three-way decisions for proteins functions classification, we focus on two probabilistic rough set models, namely, GameTheoretic Rough Sets (GTRS) [25,26,27] and Information-Theoretic Rough Sets (ITRS) [59]

Read more

Summary

Introduction

All living organisms are composed of cells, which are intricately arranged chemical factories that obtain matter from their environment and use this raw matter to generate copies of themselves [1]. A general assumption, not explicitly stated, is that the information is being fixed (i.e., not dynamic and evolving) while developing classification approaches This assumption may not be always useful, for instance, consider the classification of proteins whose functions may not be precisely identified due to lack of associated biological information ( we may anticipate it in future) thereby leading to compromised results. To address this issue, i.e., incorporating the anticipated future information into the predictive task, we propose a three-way decision making approach that includes a decision option of deferment. The code (Python/Bash/Matlab) and data files used in this work are available as a zip file (“Protein_Functions_TWD_data_code.zip”) from http://tinyurl.com/jdpwkkq

Background
Experimental results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call