DB-IR integration and its application to a massively-parallel search engine

Kyu-Young Whang

doi:10.1145/1645953.1645954

Abstract

Nowadays, as there is an increasing need to integrate the DBMS (for structured data) with Information Retrieval (IR) features (for unstructured data), DB-IR integration is becoming one of major challenges in the database area[1,2]. Extensible architectures provided by commercial object-relational DBMS(ORDBMS) vendors can be used for DB-IR integration. Here, extensions are implemented using a high-level (typically, SQL-level) interface. We call this architecture loose-coupling. The advantage of loose-coupling is ease of implementation. But, loose-coupling is not preferable for implementing new data types and operations in large databases when high performance is required. In this talk, we present a new DBMS architecture applicable to DB-IR integration, which we call tight-coupling. In tight-coupling, new data types and operations are integrated into the core of the DBMS engine in the extensible type layer. Thus, they are incorporated as the first-class citizens[1] within the DBMS architecture and are supported in a consistent manner with high performance. This tight-coupling architecture is being used to incorporate IR features and spatial database features into the Odysseus ORDBMS that has been under development at KAIST/AITrc for over 19 years. In this talk, we introduce Odysseus and explain its tightly-coupled IR features (U.S. patented in 2002[2]). Then, we demonstrate excellence in performance of tight-coupling by showing benchmark results. We have built a web search engine that is capable of managing 100 million web pages per node in a non-parallel configuration using Odysseus. This engine has been successfully tested in many commercial environments. This work won the Best Demonstration Award from the IEEE ICDE conference held in Tokyo, Japan, in April 2005[3]. Last, we present a design of a massively-parallel search engine using Odysseus. Recently, parallel search engines have been implemented based on scalable distributed file systems (e.g., GFS). Nevertheless, building a massively-parallel search engine using a DBMS can be an attractive alternative since it supports a higher-level (i.e., SQL-level) interface than that of a distributed file system while providing scalability. The parallel search engine designed is capable of indexing 30 billion web pages with a performance comparable to or better than those of state-of-the-art search engines.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

DB-IR integration and its application to a massively-parallel search engine

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A New DBMS Architecture for DB-IR Integration
Kyu-Young Whang
-
Kyu-Young WhangKyu-Young Whang
16 Jun 2007
16 Jun 2007

ODYS
Kyu-Young Whang ... Tae-Seob Yun
-
Kyu-Young Whang, et. al.Kyu-Young Whang ... Tae-Seob Yun
22 Jun 2013
22 Jun 2013

Tightly-coupled spatial database features in the Odysseus/OpenGIS DBMS for high-performance
Kyu-Young Whang ... Min-Soo Kim
GeoInformatica | VOL. 14
Kyu-Young Whang, et. al.Kyu-Young Whang ... Min-Soo Kim
07 May 2009
GeoInformatica | VOL. 14

Googling for Health Information
Jennifer P D'Auria
Journal of Pediatric Health Care | VOL. 26
Jennifer P D'AuriaJennifer P D'Auria
21 Jun 2012
Journal of Pediatric Health Care | VOL. 26

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

DB-IR integration and its application to a massively-parallel search engine

Abstract

Talk to us

Similar Papers