Abstract
With the development of World Wide Web (www), storage and utilization of web data has become a big challenge for data management research community. Web data are essentially heterogeneous data, and may change schema frequently, traditional relational data model is inappropriate for web data management. A new data model, called Wide Table (or WT for simplicity), was introduced for this task. There are several characteristics of the WT model. First, WT is usually highly sparsely populated so that most data can be fit into a line or record. Second, queries are composed on only a small subset of the attributes. Thus, existing query processing and optimization techniques for relational database with normalized tables will not work efficiently anymore. Furthermore, WT is usually of extremely large volume. It is thought that only large-scale distributed storage can accommodate themassive data set. In this paper, requirements and challenges to web data management are discussed. Existing techniques for WT, including logical presentation, physical storage, and query processing, are introduced and analyzed in detail.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have