Abstract

In this chapter we will discuss our project on the development of an integrated scientific data warehouse and data mining environment based on Oracle database technologies. There are several specific requirements for the integrated environment: 1. The data warehouse system should be scalable for hosting multiple data sets from various independent sources. Each data set could be in the order of 100M bytes (or more). By hosting it means that the data sets are either directly loaded into the database of the warehouse system, or the data sets are accessible by the data warehouse system via a dynamic link. Such a dynamic link can be thought of as a logical pointer that allows a virtual table to reference a data set in a storage device outside the database environment. Typically such a virtual table is inside a database environment where efficient data retrieval via SQL can be easily achieved while the data set is physically located outside the database environment. 2. During the run time, the data warehouse system allows a user to define relational linkages among tables from different independent sources. In RDBMS (Relational Database Management System) such relational linkages are realized as foreign key references. 3. During the run time, the data warehouse system allows a user to define a dynamic SQL query for retrieving data from different independent sources. In other words, the data warehouse has no prior knowledge on what SQL queries may be issued by a user. Therefore one cannot implement all the possible SQL queries in advance. 4. The data warehouse provides a conversion tool for handling mixed data types, as well as basic data cleaning features, such as different ways of handling missing values. Specifically, the data warehouse provides an interactive tool for converting continuous data type to ordinal finite discrete data type or categorical data type. 5. The data warehouse system is tightly integrated with the data mining tools that are implemented to realize the data mining techniques discussed in the previous chapters.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call