Abstract
BackgroundAs microarray technology has become mature and popular, the selection and use of a small number of relevant genes for accurate classification of samples has arisen as a hot topic in the circles of biostatistics and bioinformatics. However, most of the developed algorithms lack the ability to handle multiple classes, arguably a common application. Here, we propose an extension to an existing regularization algorithm, called Threshold Gradient Descent Regularization (TGDR), to specifically tackle multi-class classification of microarray data. When there are several microarray experiments addressing the same/similar objectives, one option is to use a meta-analysis version of TGDR (Meta-TGDR), which considers the classification task as a combination of classifiers with the same structure/model while allowing the parameters to vary across studies. However, the original Meta-TGDR extension did not offer a solution to the prediction on independent samples. Here, we propose an explicit method to estimate the overall coefficients of the biomarkers selected by Meta-TGDR. This extension permits broader applicability and allows a comparison between the predictive performance of Meta-TGDR and TGDR using an independent testing set.ResultsUsing real-world applications, we demonstrated the proposed multi-TGDR framework works well and the number of selected genes is less than the sum of all individualized binary TGDRs. Additionally, Meta-TGDR and TGDR on the batch-effect adjusted pooled data approximately provided same results. By adding Bagging procedure in each application, the stability and good predictive performance are warranted.ConclusionsCompared with Meta-TGDR, TGDR is less computing time intensive, and requires no samples of all classes in each study. On the adjusted data, it has approximate same predictive performance with Meta-TGDR. Thus, it is highly recommended.
Highlights
Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in areas of biomedical research and translational medicine
The proposed algorithm, the Meta Threshold Gradient Descent Regularization (Meta-TGDR), assumes that the same set of genes is selected on all studies, while allowing the b coefficients to vary across studies, in a meta-analysis fashion
As criticized by Wang et al [22], lack of parsimony is an obvious disadvantage of TGDR algorithms, a shortcoming inherited by the multi-TGDR
Summary
Biomarker discovery from high-dimensional data is a crucial problem with enormous applications in areas of biomedical research and translational medicine. Selecting a small number of relevant features (e.g., genes in transcriptomics profiles, SNPs in GWAs studies, and metabolites in metabolomics) to build a predictive model that can accurately classify samples by their diagnosis (e.g., diseased or health, different stages of one specific cancer) and prognosis (e.g., potential response to a given treatment, 5-year survival with a certain treatment) is an essential step towards personalized medicine. In bioinformatics, such a task is accomplished by a feature selection algorithm, which besides reducing over-fitting and improving classification accuracy, leads to small molecular signatures with manageable experimental verification and the potential design of cheap dedicated diagnostic and prognostic tools. This extension permits broader applicability and allows a comparison between the predictive performance of Meta-TGDR and TGDR using an independent testing set
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.