Abstract
BackgroundA common task in microarray data analysis is to identify informative genes that are differentially expressed between two different states. Owing to the high-dimensional nature of microarray data, identification of significant genes has been essential in analyzing the data. However, the performances of many gene selection techniques are highly dependent on the experimental conditions, such as the presence of measurement error or a limited number of sample replicates.ResultsWe have proposed new filter-based gene selection techniques, by applying a simple modification to significance analysis of microarrays (SAM). To prove the effectiveness of the proposed method, we considered a series of synthetic datasets with different noise levels and sample sizes along with two real datasets. The following findings were made. First, our proposed methods outperform conventional methods for all simulation set-ups. In particular, our methods are much better when the given data are noisy and sample size is small. They showed relatively robust performance regardless of noise level and sample size, whereas the performance of SAM became significantly worse as the noise level became high or sample size decreased. When sufficient sample replicates were available, SAM and our methods showed similar performance. Finally, our proposed methods are competitive with traditional methods in classification tasks for microarrays.ConclusionsThe results of simulation study and real data analysis have demonstrated that our proposed methods are effective for detecting significant genes and classification tasks, especially when the given data are noisy or have few sample replicates. By employing weighting schemes, we can obtain robust and reliable results for microarray data analysis.
Highlights
A common task in microarray data analysis is to identify informative genes that are differentially expressed between two different states
We consider 7 different combinations of n1 and n2 in order to take into account the affects of sample size and class imbalance on gene selection performance as follows: (n1, n2) = (5, 5), (5, 10), (10, 5), (10, 10), (10, 15), (15, 10) and (15, 15)
This example illustrates the structure of noisy data containing outliers
Summary
A common task in microarray data analysis is to identify informative genes that are differentially expressed between two different states. Microarray technologies allow us to measure the expression levels of thousands of genes simultaneously Analysis on such high-throughput data is not new, but it is still useful for statistical testing, which is a crucial part of transcriptomic research. A common task in microarray data analysis is to detect genes that are differentially expressed between experimental conditions or biological phenotype. Kang and Song BMC Bioinformatics (2017) 18:389 methods have been dominant over the past decades due to its strong advantages, they are the earliest in the literature [11,12,13, 15, 16] They are preferred by biology and molecular domain experts as the results generated by feature ranking techniques are intuitive and easy to understand. We focus on the filter method in this study
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.