Abstract
Multidimensional data are ubiquitous and their efficient management and analysis is a core database research problem. There are lots of previous works focusing on indexing, analyzing and querying multidimensional data. In this dissertation, three challenging advanced analysis and join problems in multidimensional spaces are proposed and studied, providing efficient solutions to their related applications. First, the problem of generalized budget constrained optimization query (Gen-BOQ) is studied. In real life, it is often difficult for manufacturers to create new products dominating their competitors, due to some constraints. These constraints can be modeled by constraint functions, and the problem is then to decide the best possible regions in multidimensional spaces where the features of new products could be placed. Using the number of dominating and dominated objects, the profitability of these regions can be evaluated and the best areas are then returned. Although GenBOQ computation is challenging due to its high complexity, an efficient divide-and-conquer based framework is offered for this problem. In addition, an approximation method is proposed, making tradeoffs between the result quality and the query cost. Next, the efficient evaluation of all top-k queries (ATOPk) in multidimensional spaces is investigated, which compute the top ranked objects for a group of preference functions simultaneously. As an application of such a query, consider an online store, which needs to provide recommendations for a large number of users simultaneously. This problem is somewhat overlooked by past research; in this thesis, batch algorithms are proposed instead of naively evaluating top-k queries individually. Similar preferences are grouped together, and two algorithms are proposed, using block indexed nested loops and a view-based thresholding strategy. The optimized view-based threshold algorithm is demonstrated to be consistently the best. Moreover, an all top-k query helps to evaluate other queries relying on the results of multiple top-k queries, such as reverse top-k queries and top-m influential queries proposed in previous works. It is shown that applying the view-based approach to these queries can improve the performance of the current state-of-the-art by orders of magnitude. Finally, the problem of spatio-textual similarity joins (ST-SJOIN) on multidimensional data is considered. Given both spatial and textual information, ST-SJOIN retrieves pairs of objects which are both spatially close and textually similar. One possible application of this query is friendship recommendation, by matching people who not only live nearby but also share common interests. By combining the state-of-the-art strategies of spatial distance joins and set similarity joins, efficient query processing algorithms are proposed, taking both spatial and textual constraints into account. A batch processing strategy is also introduced to boost the performance, which is also effective for the original textual-only joins. Using synthetic and real datasets, it is shown that the proposed techniques outperform the baseline solutions.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.