Abstract

Optimization strategies for global queries have, until very recently, assumed apriori that the network node-to-node bandwidth is at least an order of magnitude smaller than the disk-to-memory bandwidth at a given node. This assumption is appropriate for “long-haul” networks in which node-to-node data transmission rates are limited by the transmission rates available over the public data network.Query optimization strategies were heavily influenced by these assumptions, and consequently, focused on reducing node-to-node transmission costs. Local processing costs were, for the most part, considered to be “free”.In a modern high-speed local area network this assumption is simply not true. In fact in a fiber optic or coaxial based local area network with 10Mbit or greater capacity, the node-to-node bandwidth may indeed be greater than the disk-to-memory bandwidth of a local node. If there is no fragmentation of relations, only the local processing costs of selection and projection need be considered in any costing algorithm. However, if relations are fragmented, and joins and semi-joins are executed, the data transmission costs (including the costs of creating and destroying temporary relations) must be taken under consideration.Two strategies are compared and discussed for optimizing global queries in a distributed relational database with horizontally fragmented relations. The first strategy employs semi-joins to reduce the relations (or fragments) involved in the query. The second strategy replicates the query at all nodes which are involved in the query. Here, we are mostly concerned with local area networks in which the network backbone (e.g., the physical media connecting the network nodes: typically fiber optic or coaxial cable) can support transmission rates of greater than say 10 MBS.The semi-join strategy results in increased amounts of data being transmitted between nodes. However, the resulting reduced relations result in reduced local processing costs since there are fewer tuples to be examined. The replication strategy, on the other hand, reduces the amount of data passed between nodes. Each node participating in the query processes the entire query for increased local processing costs with a corresponding decrease in the node-to-node data transmission costs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.