Abstract

ObjectiveTo use software, datasets, and data formats in the domain of Infectious Disease Epidemiology as a test collection to evaluate a novel M1 use case, which we introduce in this paper. M1 is a machine that upon receipt of a new digital object of research exhaustively finds all valid compositions of it with existing objects. MethodWe implemented a data-format-matching-only M1 using exhaustive search, which we refer to as M1DFM. We then ran M1DFM on the test collection and used error analysis to identify needed semantic constraints. ResultsPrecision of M1DFM search was 61.7%. Error analysis identified needed semantic constraints and needed changes in handling of data services. Most semantic constraints were simple, but one data format was sufficiently complex to be practically impossible to represent semantic constraints over, from which we conclude limitatively that software developers will have to meet the machines halfway by engineering software whose inputs are sufficiently simple that their semantic constraints can be represented, akin to the simple APIs of services. We summarize these insights as M1-FAIR guiding principles for composability and suggest a roadmap for progressively capable devices in the service of reuse and accelerated scientific discovery. ConclusionAlgorithmic search of digital repositories for valid workflow compositions has potential to accelerate scientific discovery but requires a scalable solution to the problem of knowledge acquisition about semantic constraints on software inputs. Additionally, practical limitations on the logical complexity of semantic constraints must be respected, which has implications for the design of software.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call