Abstract
Tertiary storage is becoming increasingly important for many organizations involved in large-scale data analysis and data mining activities. Yet database management systems (DBMS) and other data-intensive systems do not incorporate tertiary storage as a first-class citizen in the storage hierarchy. For instance, the typical solution for bringing tertiary-resident data under the control of a DBMS is to use operating system facilities to copy the data to secondary storage, and then to perform query optimization and execution as if the data had been in secondary storage all along. This approach fails to recognize the opportunities for saving execution time and storage space if the data were accessed on tertiary devices directly and in parallel with other I/Os. In this paper we examine issues in accessing secondary and tertiary storage in parallel and suggest buffering mechanisms for increasing the throughput of applications with concurrent, intensive I/O requirements. We first identify several factors that determine the parallel I/O performance of secondary and tertiary storage devices. We discuss the performance characteristics of magnetic disks and magnetic tapes when used alone and when used concurrently, sharing the same I/O bus. We then describe alternative buffering schemes for parallel I/O and analyze their efficiency via an experimental implementation.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.