Abstract

Astronomers systematically study the sky with large sky surveys. A common feature of modern sky surveys is that they produce hundreds of terabytes (TB) up to 100 (or more) petabytes (PB) both in the image data archive and in the object catalogs. For example, the LSST will produce a 20–40 PB catalog database. Large sky surveys have enormous potential to enable countless astronomical discoveries. Such discoveries will span the full spectrum of statistics: from rare one-in-a-billion (or one-in-a-trillion) object types, to complete statistical and astrophysical specifications of many classes of objects (based upon millions of instances of each class). The growth in data volumes requires more effective knowledge discovery and extraction algorithms. Among these are algorithms for outlier (novelty/surprise/anomaly) detection. Outlier detection algorithms enable scientists to discover the most “interesting” scientific knowledge hidden within large and high-dimensional datasets: the “unknown unknowns”. Effective outlier detection is essential for rapid discovery of potentially interesting and/or hazardous events. Emerging unexpected conditions in hardware, software, or network resources need to be detected, characterized, and analyzed as soon as possible for obvious system health and safety reasons, just as emerging behaviors and variations in scientific targets should be similarly detected and characterized promptly in order to enable rapid decision support in response to such events. We have developed a new algorithm for outlier detection (KNN-DD: K-Nearest Neighbor Data Distributions). We have derived results from preliminary experiments in terms of the algorithm’s precision and recall for known outliers, and in terms of its ability to distinguish between characteristically different data distributions among different classes of objects.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call