Abstract

BackgroundBioinformatics data analysis is often using linear mixture model representing samples as additive mixture of components. Properly constrained blind matrix factorization methods extract those components using mixture samples only. However, automatic selection of extracted components to be retained for classification analysis remains an open issue.ResultsThe method proposed here is applied to well-studied protein and genomic datasets of ovarian, prostate and colon cancers to extract components for disease prediction. It achieves average sensitivities of: 96.2 (sd = 2.7%), 97.6% (sd = 2.8%) and 90.8% (sd = 5.5%) and average specificities of: 93.6% (sd = 4.1%), 99% (sd = 2.2%) and 79.4% (sd = 9.8%) in 100 independent two-fold cross-validations.ConclusionsWe propose an additive mixture model of a sample for feature extraction using, in principle, sparseness constrained factorization on a sample-by-sample basis. As opposed to that, existing methods factorize complete dataset simultaneously. The sample model is composed of a reference sample representing control and/or case (disease) groups and a test sample. Each sample is decomposed into two or more components that are selected automatically (without using label information) as control specific, case specific and not differentially expressed (neutral). The number of components is determined by cross-validation. Automatic assignment of features (m/z ratios or genes) to particular component is based on thresholds estimated from each sample directly. Due to the locality of decomposition, the strength of the expression of each feature across the samples can vary. Yet, they will still be allocated to the related disease and/or control specific component. Since label information is not used in the selection process, case and control specific components can be used for classification. That is not the case with standard factorization methods. Moreover, the component selected by proposed method as disease specific can be interpreted as a sub-mode and retained for further analysis to identify potential biomarkers. As opposed to standard matrix factorization methods this can be achieved on a sample (experiment)-by-sample basis. Postulating one or more components with indifferent features enables their removal from disease and control specific components on a sample-by-sample basis. This yields selected components with reduced complexity and generally, it increases prediction accuracy.

Highlights

  • Bioinformatics data analysis is often using linear mixture model representing samples as additive mixture of components

  • This work presents a feature extraction/component selection method based on innovative additive linear mixture model of a sample and sparseness constrained factorization that operates on a sample(experiment)-by-sample basis

  • Each sample is decomposed into several components selected automatically, without using label information, as disease, control specific and differentially not expressed

Read more

Summary

Introduction

Bioinformatics data analysis is often using linear mixture model representing samples as additive mixture of components. The method proposed here decomposes each sample (experiment) into components comprised of up-regulated, down-regulated and not differentially expressed features using data adaptive thresholds They are based on mixing angles of an innovative linear mixture model of a sample. We want to emphasize that the component selected as disease specific by the method proposed here can be interpreted as a sub-mode and used for the similar type of analysis Since it is extracted from an individual and labelled sample it can be used for the classification as well. The features do not have to be expressed strong across the whole dataset in order to be selected as a part of disease or case specific components It is this way due to the fact that decomposition is performed locally (on a sample-by-sample basis).

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call