Abstract

BackgroundData from patients with rare diseases is often produced using different platforms and probe sets because patients are widely distributed in space and time. Aggregating such data requires a method of normalization that makes patient records comparable.ResultsThis paper proposed DBNorm, implemented as an R package, is an algorithm that normalizes arbitrarily distributed data to a common, comparable form. Specifically, DBNorm merges data distributions by fitting functions to each of them, and using the probability of each element drawn from the fitted distribution to merge it into a global distribution. DBNorm contains state-of-the-art fitting functions including Polynomial, Fourier and Gaussian distributions, and also allows users to define their own fitting functions if required.ConclusionsThe performance of DBNorm is compared with z-score, average difference, quantile normalization and ComBat on a set of datasets, including several that are publically available. The performance of these normalization methods are compared using statistics, visualization, and classification when class labels are known based on a number of self-generated and public microarray datasets. The experimental results show that DBNorm achieves better normalization results than conventional methods. Finally, the approach has the potential to be applicable outside bioinformatics analysis.

Highlights

  • Data from patients with rare diseases is often produced using different platforms and probe sets because patients are widely distributed in space and time

  • A number of normalization methods have been proposed such as Average Difference (AvgDiff) [c], Total Count (TC) [2], and Trimmed Mean of M values (TMM) [15]

  • Distribution-based normalization method, distribution-based normalization (DBNorm) that works on microarray data from multiple sources regardless of the distributions of the data in each

Read more

Summary

Introduction

Data from patients with rare diseases is often produced using different platforms and probe sets because patients are widely distributed in space and time. Personalised or precision medicine aims to find specific therapeutics best-suited for individuals based on their genomic data [10]. It is considered a promising approach for diseases with a genetic component such as cancer. A number of normalization methods have been proposed such as Average Difference (AvgDiff) [c], Total Count (TC) [2], and Trimmed Mean of M values (TMM) [15] These methods normalize microarray data by aligning the mean and/or variance and work

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.