An Architecture for Unstructured Data Management

Yao Hu Lin,Xue Lian Lin

doi:10.4028/www.scientific.net/amr.756-759.1280

Abstract

As the information age is coming, there is a vast amount of information available in the Internet. Most of data on Web are unstructured. But the significant data should be organized and stored in a suitable way for future purposes. One of the unsolved problems is the management of unstructured data. The unstructured data such as presentation, spreadsheet, text document, memo, images and web pages are difficult to manage while the data become a large scale and the users have different requirements and interests. In this paper, we proposed an architecture for unstructured data management by integrating source query, data collection and data management to solve these problems. The data collection layer extracts the data we care about, we use the existing tools to extract automatic and we can also add the data to the repository manually. The data management layer manage all the collection data by classifying the data, selecting nodes to store and managing centralized as index. The source query layer allows users to query and get the data diversity according the adaptive query service and recommendation service. Finally, we implemented a prototype system OCourse based on this system architecture to show its feasible and efficient.

Full Text