Abstract

The NC-94 dataset, that contains climate, soil and crop data for 30 years during 1971-2000 for all counties in the north central United States, is an important resource in the agricultural community. Analyzing the dataset would yield invaluable understanding for farmers, scientists, public, planners, and policy makers to improve crop practices and yields, undertake scientific studies, and developing policy. In the parametric model and its query language ParaSQL, the concept of a dimension is built at the level of primitive values. A canonical storage for XML (CanStoreX) is a technology to store large XML documents, deemed to be in terabyte range, in a paginated form on the disk that is accessed easily and efficiently requiring very small amount of main memory. CanStoreX is used as a back-end for storing NC-94 data, hiding the heterogeneity in climate, crop, and soil data in order to allow the user a simple view of counties as objects where geographical and time dimensions are implicit and taken for granted. This work has focused on loading the NC-94 database on the CanStoreX storage platform. The combination of existing parametric query constructs and an efficient storage structure will provide an important tool to researchers who wish to analyze the NC-94 dataset. The process of loading this database has also revealed important inconsistencies in the data, which we have tried to address and hence develop a more consistent view of the dataset. Previously only climate data was available and it was stored in an older version of CanStoreX where XML was stored in text form. The newer binary version of CanStoreX allows a readily available tree-like navigation in the paginated XML document. Addition of crop and soil data requires different internal representation in order to achieve a uniform view for users that is at par with the climate data. Further, the internals were conformed to use the version of CanStoreX where pages are stored in binary, rather than text form.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call