Many of the database research issues involved in dealing with multimedia data are similar to those that arise in handling other nontraditional data such as spatial, image, temporal, text, document, and scientific. We focus here on multimedia database issues, which we hope can serve as a starting point for a wider discussion. Our examples are often taken from the spatial domain as this is where we have the greatest expertise, although these issues are far more general. Why do we want a database? The natural and simple answer is to be able to store and retrieve data efficiently. Notice the emphasis on retrieval. We should not lose sight of this purpose. For example, it means that storing images in long fields in a relational database is usually not the answer. Long fields are usually a stopgap solution as they are just a repository for data and do not aid in its retrieval. In particular, as the data volume gets large, this solution breaks down because the tuples get too large. We need to be able to integrate nontraditional data with traditional (e.g., alphanumeric) data. Alphanumeric data can frequently be treated just like locational data in that the records that make up the alphanumeric data are like points in a higher-dimensional space where each attribute is analogous to a spatial dimension. The difference is that spatial data have more than just a locational component. In particular, spatial data are distinguished from nonspatial data by having spatial extent. A number of attempts at integration take advantage of this analogy. However, it can also act as a straitjacket in the case of the relational model. Some examples of successful integration include spatial with nonspatial data [Aref and Samet 1990], temporal with nontemporal data including spatial data [Hjaltason and Samet 1995], document with nondocument data [SacksDavis et al. 1995], image with locational and nonvocational data [Samet and Soffer 1995], and so on. Efficient retrieval is facilitated by building an index [Samet 1990a, 1990b]. This means that we need to find a way to sort the data. Surprisingly, this is not always done for such applications (e.g., it is absent in the Photobook image database system [Pentland et al. 1994]). The index should be compatible with the data that are being stored, and we also need to choose an appropriate zero or reference point for it. The index should be implicit rather than explicit, as it is impossible to foresee all possible queries in advance. For example, in the spatial domain, assuming a relational model, it is impractical to have an attribute for each spatial relationship (e.g., north, northeast, left, etc.). Instead, the index should enable us to derive these relationships on the fly. As a more concrete example, an explicit index would sort twodimensional locational data on the basis of distance from a given point x; yet this would not be very useful if we wanted to have the locations sorted with respect to a different point y. In particular, we
Read full abstract