Chapter 1 - Introduction

Amy Neustein,Nathaniel Christen

doi:10.1016/b978-0-32-385197-8.00005-2

Abstract

In the first chapter we present the scaffolding of the book, introducing novel approaches to data modeling, such as type theory, conceptual spaces, or graph database architectures. We posit that any database, data set, or information space should be engineered with the expectation that multiple (not fully isomorphic) software components will be interacting with that data; and that parts therein will be passed and shared between such components, implying that data should be structured to facilitate cross-component communication. We propose that most of the theoretical constructions summarized here vis-à-vis hypergraph or code models may be concretely instantiated through virtual machines, via which query evaluation engines may be implemented. We use the architecture of virtual machines within this category as an organizing motif for analyses. From a more practical or “applied” point of view, we will call attention in particular to biomedical research projects that synthesize information with variegated disciplinary provenance and diverse data profiles. While biomedical research is inherently interdisciplinary, new breakthroughs and new research methods and technologies have further accelerated the cross-disciplinary insights of research in several specific biomedical disciplines, yielding diagnostic, prognostic, and explanatory models that cut across biophysical scales (molecular, cellular, tissues, organs) and data acquisition modalities (proteomics, genomics, biopsies, image processing, lab assays—such as for biologic sample analysis—and so forth). In examining the literature where these integrative studies are described, it becomes clear that scientists often construct the software ecosystem powering their research in ad-hoc ways, piecing together diverse software components (sometimes standalone applications, sometimes code libraries, or some combination thereof) designed for specific disciplinary contexts. We argue in this book that the relatively informal trial-and-error approach to integrating multidisciplinary biomedical data can act as an impediment to research replication and the systematic evaluation of interdisciplinary research findings. This warrants engaging in a detailed review of data profiles, data modeling paradigms, and data integration techniques, so as to lay the foundation for a software ecosystem that can support the emerging paradigm of transparent and replicable research data and digital scientific resources, such as code libraries and publications.

Full Text