Abstract

Microarrays are applications of electrical engineering and technology in biology that allow simultaneous measurement of expression of numerous genes, and they can be used to analyze specific diseases. This study undertakes classification analyses of various microarrays to compare the performances of classification algorithms over different data traits. The datasets were classified into test and control groups based on five utilized machine learning methods, including MultiLayer Perceptron (MLP), Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and k-Nearest Neighbors (KNN), and the resulting accuracies were compared. k-fold cross-validation was used in evaluating the performance and the result was analyzed by comparing the performances of the five machine learning methods. Through the experiments, it was observed that the two tree-based methods, DT and RF, showed similar trends in results and the remaining three methods, MLP, SVM, and DT, showed similar trends. DT and RF generally showed worse performance than other methods except for one dataset. This suggests that, for the effective classification of microarray data, selecting a classification algorithm that is suitable for data traits is crucial to ensure optimum performance.

Highlights

  • Microarrays have been developed by combining modern mechanical and electrical engineering technologies with the existing knowledge in molecular biology

  • All samples of the patients diagnosed with interstitial lung disease (ILD) or chronic obstructive pulmonary disease (COPD) were obtained from the Lung Tissue Research Consortium (LRTC)

  • The results show that Support Vector Machine (SVM), MultiLayer Perceptron (MLP), and k-Nearest Neighbors (KNN) had accuracies of over 80%

Read more

Summary

Introduction

Microarrays have been developed by combining modern mechanical and electrical engineering technologies with the existing knowledge in molecular biology. While the traditional methods allowed researchers to measure the expression of a small number of genes at a time, the introduction of microarrays enabled the expression analysis of tens of thousands of genes in a single experiment This led to the development of experimental techniques that were capable of generating a large volume of genomic information from a single cell [1]. The classification analysis method is a widely used multivariate statistical method that can be used to determine or predict classes of unknown groups of data This method has typically been used to analyze cancer microarray data, and many recent studies have accurately classified acute myeloid leukemia and acute lymphoblastic leukemia using this method [6]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.