Abstract

This paper develops a query language for sequence databases, such as genome databases and text databases. Unlike relational data, queries over sequential data can easily produce infinite answer sets, since the universe of sequences is infinite, even for a finite alphabet. The challenge is to develop query languages that are both highly expressive and finite. This paper develops such a language. It is a subset of a recently developed logic called Sequence Datalog [19]. SequenceDatalog distinguishes syntactically between subsequence extraction and sequence construction . Extraction creates sequences of bounded length, and leads to safe recursion; while construction can create sequences of arbitrary length, and leads to unsafe recursion. In this paper, we develop syntactic restrictions for Sequence Datalog that allow sequence construction but preserve finiteness. The main idea is to use safe recursion to control and limit unsafe recursion. The main results are the definition of a finite form of recursion, called domain bounded recursion , and a characterization of its complexity and expressive power. Although finite, the resulting class of programs is highly expressive, since its data complexity is complete for the elementary functions.

Highlights

  • It is widely accepted that relational databases do not provide enough support for many of today’s advanced applications

  • In [19], we showed how networks of these machines could be expressed in Sequence Datalog

  • This paper addresses the ability of query languages to express sequence functions

Read more

Summary

Introduction

It is widely accepted that relational databases do not provide enough support for many of today’s advanced applications. In other cases, such as genome databases [12] and text databases [14], there is still a need for more flexibility in data representation and manipulation In these applications, much of the data has an inherently sequential structure. Two safe subsets of the logic were defined, based on a new computational model called Generalized Sequence Transducers These machines are a simple yet powerful device for computing sequence mappings. We take a different approach: instead of computational definitions, we develop syntactic restrictions that guarantee finiteness and safety This provides an alternate view of finite computations in the logic. The first result is a syntactically defined class of Sequence Datalog programs that guarantees finiteness and safety. We call these programs domain bounded programs.

Overview of Sequence Datalog
Controlling Constructive Recursion
Preliminary Definitions
The Finiteness Problem
Domain Bounded Recursion
Reasoning about Length
Constrained Variables
Complexity and Expressibility
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.