Multiclass Classification Problem of Large-Scale Biomedical Meta-Data

Sebastian Student,Justyna Pieter,Krzysztof Fujarewicz

doi:10.1016/j.protcy.2016.01.093

Sebastian Student, Justyna Pieter + Show 1 more

Open Access

https://doi.org/10.1016/j.protcy.2016.01.093

Copy DOI

Journal: Procedia Technology	Publication Date: Jan 1, 2016
Citations: 7	License type: cc-by-nc-nd

Affiliation: Silesian University of Technology

Abstract

One of the important data mining method in biomedical research is classification task. Recent advances in biomedicine provide opportunities for molecular biology, such as measurement of activity of thousands of molecular tissue biomarkers. For example we can use data of gene expression measured by DNA microarrays or RNA-Seq technique, DNA methylation levels measured by DNA methylation microarrays or protein and phosphoprotein levels measured by reverse phase protein array. A big problem in applying large-scale genomic and proteomic data for classification problem is the dimension of these data. In this work, we propose novel multiclass feature selection and classification system for merged data from different molecular biomedical techniques. However, when we merge these data the biggest problem is the huge number of features with a limited number of samples. For that reason the feature selection step is crucial in high dimension data classification problem. Our results have shown that integrated analysis with proper feature selection and classification techniques used for large-scale meta-data can improve the classification accuracy and feature selection stability index. We have proofed, that for merged data we observe significantly higher classification accuracy for the same number of selected features as for single technique dataset.

Full Text