Autonomous Data Exchange: The Malady and a Possible Path to Its Cure

Eli Rohn

doi:10.4236/iim.2015.71003

Abstract

Data exchange is a goal-oriented social communications system implemented through computerized technology. Data definition languages (DDLs) provide the syntax for communicating within and between organizations, illocutionary acts, such as informing, ordering and warning. Data exchange results in meaning-preserving mapping between an ensemble (a constrained variety) and its external (unconstrained) variety. Research on unsupervised structured and semi-structured data exchange has not produced any significant successes over the past fifty years. As a step towards finding a solution, this article proposes a new look at data exchange by using the principles of complex adaptive systems (CAS) to analyze current shortcomings and to propose a direction that may indeed lead to workable and mathematically grounded solution. Three CAS attributes key to this research are variety, tension and entropy. We use them to show that older and contemporary DDLs are identical in their core, thus explaining why even XML and Ontologies have failed to a create fully automated data exchange mechanism. Then we show that it is possible to construct a radically different DDL that overcomes existing data exchange limitations—its variety, tension and entropy are different from existing solutions. The article has these major parts: definition of key CAS attributes; quantitative examination of representative old and new DDLs using these attributes; presentation of the results and their pessimistic ramification; a section that proposes a new theoretical way to construct DDLs that is based entirely on CAS principles, thus enabling unsupervised data exchange. The theory is then tested, showing very promising results.

Highlights

Data exchange is a pervasive challenge faced in applications that need to query across multiple autonomous andHow to cite this paper: Rohn, E. (2015) Autonomous Data Exchange: The Malady and a Possible Path to Its Cure
Data structures can differ in three aspects: their structure, field or tag names, and the syntax used to define the data structure
The variety quantification method explained above proves that the distribution of words in each of the schemas follow a typical Zipf distribution (Figure 1)

Summary

Introduction

Data exchange is a pervasive challenge faced in applications that need to query across multiple autonomous andHow to cite this paper: Rohn, E. (2015) Autonomous Data Exchange: The Malady and a Possible Path to Its Cure. Data exchange is a pervasive challenge faced in applications that need to query across multiple autonomous and. It is a major challenge for companies entering into mergers or acquisitions. Data exchange is crucial for large enterprises that own a multitude of data sources, for progress in largescale scientific projects (where data sets are produced independently by multiple researchers), for better cooperation among government agencies (each with its own data sources), and for offering good search quality across the millions of structured data sources on the worldwide web. Computerized data structures are constructed using a given syntax, which is usually referred to as the data definition language (DDL). The DDL specifies how to organize and interconnect related elementary pieces of data into useable structures, i.e., it codifies messages to be sent or received by computerized systems or their components. Data structures can differ in three aspects: their structure (which implies the level of detail), field or tag names, and the syntax used to define the data structure

Results

Discussion

Conclusion