Abstract

Bicluster pattern discovery plays a key role in analysis of gene expression data. One vital model of bicluster mining is Order-Preserving SubMatrix (OPSM), which finds similar tendency of some genes on some conditions. Most of the OPSM discovery methods are batch mining techniques and not suitable for low latency data query. To make data analysis efficient and effective, in this paper, we first propose a prefix-tree based indexing method <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">pfTree</i> , then give an optimization technique <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">pIndex</i> that employs row and column header tables to search the positive, negative and time-delayed OPSMs. Meanwhile, we present an online sharing query technique to accelerate the frequent searches. Finally, we conduct extensive experiments and compare our methods with the existing approaches. Experimental results demonstrate the efficiency and effectiveness of the proposed methods.

Highlights

  • Gene microarray technology gives the chances for monitoring of the expression level of huge genes on many experiments simultaneously

  • Order-Preserving SubMatrix (OPSM) QUERIES we explore the multiple types of OPSM queries, which include positive, negative, and time-delayed OPSM queries, based on pIndex with two header tables

  • GENERAL OPSM QUERIES Based on the Positive OPSM query method, we present a general query method for multiple types of OPSM search, Algorithm 10, which consists of Positive OPSM query, Negative OPSM query, and Time-delayed OPSM query

Read more

Summary

INTRODUCTION

Gene microarray technology gives the chances for monitoring of the expression level of huge genes on many experiments simultaneously. In order to improve query efficiency, two header tables are added to the pfTree and named it as pIndex Both of these structures can index two kinds of data, i.e., gene expression data and OPSM data, and OPSMs can be queried directly on them, it eliminates the process of mining OPSM from gene expression data. PIndex uses the row and column header tables to update the index and query OPSMs. To further improve query performance, two pruning methods are proposed to reduce the traversal of useless branches. Especially when executing fuzzy queries, take more than one second, an online sharing query technique is necessary to proposed to reduce the cost of frequent and time-consuming searches It applies two indexes pfTree and pIndex on two kinds of datasets, i.e., gene expression and OPSM datasets.

PRELIMINARIES
OPSM QUERIES
ONLINE SHARING QUERIES
EXPERIMENTAL EVALUATION
Findings
VIII. CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call