Abstract
Merging gene expression datasets is a simple way to increase the number of samples in an analysis. However experimental and data processing conditions, which are proper to each dataset or batch, generally influence the expression values and can hide the biological effect of interest. It is then important to normalize the bigger merged dataset, as failing to adjust for those batch effects may adversely impact statistical inference. Batch effect removal methods are generally based on a location-scale approach, however less widespread methods based on matrix factorization have also been proposed. We investigate on breast cancer data how those batch effect removal methods improve (or possibly degrade) the performance of simple classifiers. Our results indicate that the matrix factorization approach would deserve greater attention, as it gives results at least as good as common location-scale methods, and even significantly better results in specific cases.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.