Scientific Data Mining: A Case Study

Chia-Yo Chang,Roger K Chang,Jason T L Wang

doi:10.1142/s0218194098000078

Abstract

Scientific data mining is the activity of finding significant information in scientific data. This paper presents an example of scientific data mining: the discovery of approximately common patterns in RNA secondary structures. We represent an RNA secondary structure by an ordered labeled tree based on a previously proposed scheme. The patterns in the trees are substructures that can differ in both substitutions and deletions/insertions of nodes of the trees. Our techniques incorporate approximate tree matching algorithms and novel heuristics for discovery and optimization. Experimental results obtained by running these algorithms on both generated data and RNA secondary structures show the good performance of the algorithms. It is shown that the optimization heuristics speed up the discovery algorithm by a factor of 10. Moreover, our optimized approach is 100,000 times faster than the brute force method.

Full Text