Abstract
Micro RNA (miRNA) plays important roles in a variety of biological processes and can act as disease biomarkers. Thus, establishment of discovery methods to detect disease-related miRNAs is warranted. Human omics data including miRNA expression profiles have orders of magnitude with much more number of descriptors (p) than that of samples (n), which is so called “p > > n problem”. Since traditional statistical methods mislead to localized solutions, application of machine learning (ML) methods that handle sparse selection of the variables are expected to solve this problem. Among many ML methods, least absolute shrinkage and selection operator (LASSO) and multivariate adaptive regression splines (MARS) give a few variables from the result of supervised learning with endpoints such as human disease statuses. Here, we performed systematic comparison of LASSO and MARS to discover biomarkers, using six miRNA expression data sets of human disease samples, which were obtained from NCBI Gene Expression Omnibus (GEO). We additionally conducted partial least square method discriminant analysis (PLS-DA), as a control traditional method to evaluate baseline performance of discriminant methods. We observed that LASSO and MARS showed relatively higher performance compared to that of PLS-DA, as the number of the samples increases. Also, some of the identified miRNA species by ML methods have already been reported as candidate disease biomarkers in the previous biological studies. These findings should contribute to the extension of our knowledge on ML method performances in empirical utilization of clinical data.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.