Abstract

As an important tumor suppressor protein, reactivating mutated p53 was found in many kinds of human cancers and that restoring active p53 would lead to tumor regression. In recent years, more and more data extracted from biophysical simulations, which makes the modelling of mutant p53 transcriptional activity suffering from the problems of huge amount of instances and high feature dimension. Incremental feature extraction is effective to facilitate analysis of large-scale data. However, most current incremental feature extraction methods are not suitable for processing big data with high feature dimension. Partial Least Squares (PLS) has been demonstrated to be an effective dimension reduction technique for classification. In this paper, we design a highly efficient and powerful algorithm named Incremental Partial Least Squares (IPLS), which conducts a two-stage extraction process. In the first stage, the PLS target function is adapted to be incremental with updating historical mean to extract the leading projection direction. In the last stage, the other projection directions are calculated through equivalence between the PLS vectors and the Krylov sequence. We compare IPLS with some state-of-the-arts incremental feature extraction methods like Incremental Principal Component Analysis, Incremental Maximum Margin Criterion and Incremental Inter-class Scatter on real p53 proteins data. Empirical results show IPLS performs better than other methods in terms of balanced classification accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call