Abstract

BackgroundThere have been many recent breakthroughs in processing and analyzing large-scale data sets in biomedical informatics. For example, the CytoGPS algorithm has enabled the use of text-based karyotypes by transforming them into a binary model. However, such advances are accompanied by new problems of data sparsity, heterogeneity, and noisiness that are magnified by the large-scale multidimensional nature of the data. To address these problems, we developed the Mercator R package, which processes and visualizes binary biomedical data. We use Mercator to address biomedical questions of cytogenetic patterns relating to lymphoid hematologic malignancies, which include a broad set of leukemias and lymphomas. Karyotype data are one of the most common form of genetic data collected on lymphoid malignancies, because karyotyping is part of the standard of care in these cancers.ResultsIn this paper we combine the analytic power of CytoGPS and Mercator to perform a large-scale multidimensional pattern recognition study on 22,741 karyotype samples in 47 different hematologic malignancies obtained from the public Mitelman database.ConclusionOur findings indicate that Mercator was able to identify both known and novel cytogenetic patterns across different lymphoid malignancies, furthering our understanding of the genetics of these diseases.

Highlights

  • There have been many recent breakthroughs in processing and analyzing large-scale data sets in biomedical informatics

  • Number of components and clusters We applied CytoGenetic Pattern Sleuth (CytoGPS) to the lymphoid malignancy samples from the Mitelman database, which generated a binary matrix of 22,741 samples and 2748 binary LGF features

  • Lymphoid karyotype clusters We have shown that, by combining CytoGPS with Mercator to analyze 22,741 karyotypes obtained from the public Mitelman database, we are able to recover both simple and complex cytogenetic events that are important for understanding and classifying lymphoid malignancies

Read more

Summary

Introduction

There have been many recent breakthroughs in processing and analyzing large-scale data sets in biomedical informatics. The CytoGPS algorithm has enabled the use of text-based karyotypes by transforming them into a binary model Such advances are accompanied by new problems of data sparsity, heterogeneity, and noisiness that are magnified by the large-scale multidimensional nature of the data. To address these problems, we developed the Mercator R package, which processes and visualizes binary biomedical data. Mercator supports five data visualization methods designed for both standard and high dimensional data analysis; the visualization tools work with arbitrary distance metrics for any data type, not just binary. Mercator enables the exploratory unsupervised analysis of large, high-dimensional data sets, accompanied by clear, easy visualizations

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.