Abstract
BackgroundAlterations of a genome can lead to changes in protein functions. Through these genetic mutations, a protein can lose its native function (loss-of-function, LoF), or it can confer a new function (gain-of-function, GoF). However, when a mutation occurs, it is difficult to determine whether it will result in a LoF or a GoF. Therefore, in this paper, we propose a study that analyzes the genomic features of LoF and GoF instances to find features that can be used to classify LoF and GoF mutations.MethodsIn order to collect experimentally verified LoF and GoF mutational information, we obtained 816 LoF mutations and 474 GoF mutations from a literature text-mining process. Next, with data-preprocessing steps, 258 LoF and 129 GoF mutations remained for a further analysis. We analyzed the properties of these LoF and GoF mutations. Among the properties, we selected features which show different tendencies between the two groups and implemented classifications using support vector machine, random forest, and linear logistic regression methods to confirm whether or not these features can identify LoF and GoF mutations.ResultsWe analyzed the properties of the LoF and GoF mutations and identified six features which have discriminative power between LoF and GoF conditions: the reference allele, the substituted allele, mutation type, mutation impact, subcellular location, and protein domain. When using the six selected features with the random forest, support vector machine, and linear logistic regression classifiers, the result showed accuracy levels of 72.23%, 71.28%, and 70.19%, respectively.ConclusionsWe analyzed LoF and GoF mutations and selected several properties which were different between the two classes. By implementing classifications with the selected features, it is demonstrated that the selected features have good discriminative power.
Highlights
Alterations of a genome can lead to changes in protein functions
We analyzed LoF and GoF mutations and selected several properties which were different between the two classes
By implementing classifications with the selected features, it is demonstrated that the selected features have good discriminative power
Summary
Alterations of a genome can lead to changes in protein functions Through these genetic mutations, a protein can lose its native function (loss-of-function, LoF), or it can confer a new function (gain-of-function, GoF). Because proteins are generated and regulated based on the genome sequence, alterations of the genome can lead to changes of protein functions [1] Through these genetic mutations, a protein can loss its native function (loss-of-function), or it can confer a new function (gainof-function) [2,3,4,5]. The B-SIFT algorithm calculates scores of mutation alleles based on evolutionary conservation information [3] They used the scores to identify mutations which cause hyperactivation or gain-of-function outcomes, but our work uses the functional effects of mutations and several other properties. Most previous studies focused on either LoF or GoF mutations or on functional changes in a specific gene
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have