시맨틱 웹 데이터에서 접미사 배열 기반의 경로 질의 처리 기법

Sung-Wan Kim

doi:10.9708/jksci/2012.17.10.107

Abstract

서로 연결된 데이터들의 의미를 컴퓨터가 이해하여 자동으로 처리할 수 있는 시맨틱 기술의 보급이 확산되고 있다. 시맨틱 웹에서 데이터에 대한 처리는 데이터 자체에 대한 접근뿐만 아니라 데이터 상호간의 연관성 즉, 데이터 상호간의 의미에 대한 이해와 접근을 중요시 하고 있다. 시맨틱 웹의 데이터와 그 연관성을 표현하기 위해 W3C에서는 RDF를 표준 형식으로 제정하였으며 RDF로 표현된 데이터에 대한 질의 처리를 지원하기 위해 여러 RDF 질의어가 제안되었으나 시맨틱 연관성을 고려한 질의어 정의와 이에 관련한 질의 처리 기법은 계속적인 연구가 필요한 분야이다. 본 논문에서는 RDF 질의 처리를 위해 소개된 접미사 배열 기반의 인덱싱 기법을 기반으로 시맨틱 연관성의 대표적 유형인 <TEX>${\rho}$</TEX>-path 질의를 처리하기 위한 방법을 제안한다. 제안된 질의 처리 방법의 성능 평가를 위해 다른 두 가지 형태의 처리 방법을 구현하여 실험적으로 비교하였다. 평균 질의 처리 시간 측정을 통해 제안 기법이 다른 두 가지 처리 방법에 비해 각각 약 1.8~2.5배와 3.8~11배의 우수한 처리 성능을 보인다. The applying of semantic technologies that aim to let computers understand and automatically process the meaning of the interlinked data on the Web is spreading. In Semantic Web, understanding and accessing the associations between data that is, the meaning between data as well as accessing to the data itself is important. W3C recommended RDF (Resource Description Framework) as a standard format to represent both Semantic Web data and their associations and also proposed several RDF query languages in order to support query processing for RDF data. However further researches on the query language definition considering the semantic associations and query processing techniques are still required. In this paper, using the suffix array-based indexing scheme previously introduced for RDF query processing, we propose a query processing approach to handle <TEX>${\rho}$</TEX>-path query which is the representative type of semantic associations. To evaluate the query processing performance of the proposed approach, we implemented two different types of query processing approaches and measured the average query processing times. The experiments show that the proposed approach achieved 1.8 to 2.5 and 3.8 to 11 times better performance respectively than others two.

Full Text