Abstract
MotivationThe ever-increasing number of biomedical datasets provides tremendous opportunities for re-use but current data repositories provide limited means of exploration apart from text-based search. Ontological metadata annotations provide context by semantically relating datasets. Visualizing this rich network of relationships can improve the explorability of large data repositories and help researchers find datasets of interest.ResultsWe developed SATORI—an integrative search and visual exploration interface for the exploration of biomedical data repositories. The design is informed by a requirements analysis through a series of semi-structured interviews. We evaluated the implementation of SATORI in a field study on a real-world data collection. SATORI enables researchers to seamlessly search, browse and semantically query data repositories via two visualizations that are highly interconnected with a powerful search interface.Availability and implementationSATORI is an open-source web application, which is freely available at http://satori.refinery-platform.org and integrated into the Refinery Platform.Supplementary information Supplementary data are available at Bioinformatics online.
Highlights
Public data repositories are rapidly growing in size and number through implementation of data release policies stipulated by journals and funding agencies (Margolis et al, 2014)
SATORI focuses on exploration of data repositories by finding datasets based on metadata attributes rather than comparing groups of metadata attributes
SATORI contributes to the biomedical domain by unifying text-based search with visual exploration approaches, which put datasets into context and shed light in the repository-wide distribution of biological attributes
Summary
Public data repositories are rapidly growing in size and number through implementation of data release policies stipulated by journals and funding agencies (Margolis et al, 2014). The availability of tens of thousands of datasets provides tremendous opportunities for re-use of data across studies. Already published datasets can be used to test a novel hypothesis without having to generate new data. Data from previous studies can be employed as corroborating evidence for observations made in an experiment. Meta studies that include data from dozens or hundreds of published datasets are another common use case for the repurposing of previously generated data. Lukk et al (2010) studied patterns of gene expression in human tissues based on hundreds of public gene expression datasets and a similar study was conducted for mouse tissues by Zheng-Bradley et al (2010). Other groups have studied connections between different diseases using publicly available datasets (Caldas et al, 2009, 2012; Suthram et al, 2010)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.