Abstract

Existing distance-based outlier mining methods do not consider the impact of each attribute's importance degree, thereby resulting in poor mining accuracies. To address this problem, we propose a new outlier mining algorithm – Miner* – that makes use of information entropy and Weighted Distance Sum to substantially improve mining accuracies. Miner* employs information entropy to determine weight values indicating the importance degrees of data attributes. An input dataset is reduced by Miner* through the neighbour-radius-based pruning technologies. Thus, Miner* obtains a candidate outlier set by removing any data objects that are unlikely to be outliers. Miner* calculates the weighted distance sum value Wkof each object in the candidate outlier set; Wkvalue ranks the top n to be regarded as outliers. Due to the sum of distance, which takes full advantage of the clustering characteristics of the dataset, edge distribution data objects and local outliers can be effectively mined out. To demonstrate the effectiveness of the Miner* algorithm, we implement Miner* in a prototype system to detect star spectrum data objects with abnormal characteristic lines. Our experimental results show that the algorithm in Miner* achieves high accuracy, high scalability, and low man-made influence by utilizing UCI and star spectrum dataset. Our results also confirm that Miner* is feasible and effective in mining spectrum data with abnormal characteristic lines from massive star spectrum dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call