Abstract

The remarkable growth of emerging technologies and computing paradigms in cyberspace and the cyber physical systems generate a huge mass of data sources. These different autonomous and heterogeneous data sources can contain complementary or semantically equivalent information stored under different formats that vary from structured, semi structured, to unstructured. These heterogeneities influence on data semantics and meaning. Therefore, knowledge management became more and more difficult and sometimes fruitless. In this paper, we propose a new scalable model, named Distributed Semantic Network (DSN), for heterogeneous data representation and can extract more semantic information from different data sources. We use the prior knowledge of WordNet and Wikipedia to scale out DSN horizontally and vertically. Furthermore, we proposed a MapReduce based framework to construct the knowledge base more effectively in Parallel and Distributed Computing (PDC). The experimental results show that DSN can better model the semantic information in the text. It can extract a larger amount of information from the text with a higher precision, achieving 34% increase in quantity and 15% promotion on precision than the best-performing alternative method on same datasets. On the three datasets, our proposed PDC framework shorten the process time by 5.8–11.5 times.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call