Abstract

While catchphrases such as big data, smart data, data‐intensive science, or smart dust highlight different aspects, they share a common theme — namely, a shift toward a data‐centered perspective in which the synthesis and analysis of data at an ever‐increasing spatial, temporal, and thematic resolution promise new insights, while, at the same time, reduce the need for strong domain theories as starting points. In terms of the envisioned methodologies, those catchphrases tend to emphasize the role of predictive analytics, that is, statistical techniques including data mining and machine learning, as well as supercomputing. Interestingly, however, while this perspective takes the availability of data as a given, it does not answer the question how one would discover the required data in today's chaotic information universe, how one would understand which data sets can be meaningfully integrated, and how to communicate the results to humans and machines alike. The semantic web addresses these questions. In the following, we argue why the data train needs semantic rails. We point out that making sense of data and gaining new insights work best if inductive and deductive techniques go hand‐in‐hand instead of competing over the prerogative of interpretation.

Highlights

  • Krzysztof Janowicz, Frank van Harmelen, James A

  • We will argue that the semantic web provides such an infrastructure and is entirely based on open and well-established standards

  • Instead of presenting an all-encompassing survey, we will focus on selected aspects that are of particular interest to data-intensive science (Hey, Tansley, and Tolle 2009) thereby illustrating the value proposition of Articles semantic technologies and ontologies

Read more

Summary

Why the Data Train Needs Semantic Rails

Krzysztof Janowicz, Frank van Harmelen, James A. Google Maps, for instance, draws different borders depending on whether a user is accessing the service from India or the United States To avoid such (and many other) difficulties, it is crucial to provide humans and machines with additional information — for instance, the fact that the used coordinates represent points (centroids) and not polygons, that topological information about neighboring states is available (Russia and Ukraine share a border, while Russia and Pakistan do not), that 17 percent of Ukraine’s population is of Russian ethnicity, that the UN list of member states has been used as extensional definition of the term country, that the centroids were recorded by a mapping agency in India, and so forth. New insights are gained from integrating and mining multithematic and multiperspective data from highly heterogeneous resources across domains and disciplines

Synthesis Is the New Analysis
Smart Data Versus Smart Applications
Vocabulary Diverse Data
Linking and Exploring Data
Compressing and Maintaining Data
Combining Inductive and Deductive Methods
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call