A Data-Driven Multidimensional Indexing Method for Data Mining in Astrophysical Databases

Marco Frailis,Vito Roberto,Alessandro De Angelis

doi:10.1155/asp.2005.2514

Marco Frailis, Vito Roberto + Show 1 more

Open Access

https://doi.org/10.1155/asp.2005.2514

Copy DOI

Abstract

Large archives and digital sky surveys with dimensions of bytes currently exist, while in the near future they will reach sizes of the order of . Numerical simulations are also producing comparable volumes of information. Data mining tools are needed for information extraction from such large datasets. In this work, we propose a multidimensional indexing method, based on a static R-tree data structure, to efficiently query and mine large astrophysical datasets. We follow a top-down construction method, called VAMSplit, which recursively splits the dataset on a near median element along the dimension with maximum variance. The obtained index partitions the dataset into nonoverlapping bounding boxes, with volumes proportional to the local data density. Finally, we show an application of this method for the detection of point sources from a gamma-ray photon list.

Highlights

At present, several projects for the multiwavelength observation of the universe are underway, for example, Sloan Digital Sky Survey (SDSS), GALEX, POSS2, DENIS, and so forth [1]
We propose a point source detection algorithm based on kernel methods [15], and in particular on the one-class support vector machines (SVMs) [16]
The one-class SVM algorithm estimates the support of a multidimensional distribution, that is, a binary function such that most of the data will live in the region where the function is nonzero

Summary

INTRODUCTION

Several projects for the multiwavelength observation of the universe are underway, for example, SDSS, GALEX, POSS2, DENIS, and so forth [1]. Typical queries required by this kind of analysis are the following: (i) point queries, to find all objects overlapping the query point; (ii) range queries, to find all objects having at least one common point with a query window; and (iii) nearest-neighbor queries, to find all objects that have a minimum distance from the query object Another important operation is the spatial join, which in the astrophysical field is needed to search multiple source catalogs and cross-identify sources from different wavebands. These multidimensional (spatial) data tend to be large (sky maps can reach sizes of terabytes) requiring the integration of the secondary storage, and there is no total ordering on spatial objects preserving spatial proximity [4]. This characteristic makes it difficult to use traditional indexing methods, like B+-trees or linear hashing

AN OPTIMIZED R-TREE

Determination of the tree topology

The split strategy

TESTS ON A PHOTON DATASET

NEIGHBORHOOD AND “WEAK” ADJACENCY

A STRATEGY FOR THE DETECTION OF POINT SOURCES

One-class SVM

Scaling one-class method with the optimized R-tree

Tests on the anticenter region

CONCLUSIONS

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Advances in Signal Processing	Publication Date: Sep 14, 2005
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

A Data-Driven Multidimensional Indexing Method for Data Mining in Astrophysical Databases

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing

Lead the way for us

Similar Papers

DATA MINING IN GAMMA ASTROPHYSICS EXPERIMENTS
MARCO FRAILIS ... ALESSANDRO DE ANGELIS
-
MARCO FRAILIS, et. al.MARCO FRAILIS ... ALESSANDRO DE ANGELIS
01 Jan 2006
01 Jan 2006

Structurally Constrained Anisotropic Multi-Wave-Inversion Utilizing Machine Learning and Big Data on a Middle East OBC Project
V Prieux ... T Bardainne
-
V Prieux, et. al.V Prieux ... T Bardainne
01 Jan 2020
01 Jan 2020

Summarizing Large News Video Archives by Event Ranking
Duy-Dinh Le ... Shin'Ichi Satoh
-
Duy-Dinh Le, et. al.Duy-Dinh Le ... Shin'Ichi Satoh
01 Sep 2011
01 Sep 2011

Data Mining and Analysis of Large Scale Time Series Network Data
P Morreale ... S Holtz
-
P Morreale, et. al.P Morreale ... S Holtz
01 Mar 2013
01 Mar 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Data-Driven Multidimensional Indexing Method for Data Mining in Astrophysical Databases

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing