Abstract

Abstract Many real world problems have big data, including recorded fields and/or attributes. In such cases, data mining requires dimension reduction techniques because there are serious challenges facing conventional clustering methods in dealing with big data. The subspace selection method is one of the most important dimension reduction techniques. In such methods, a selected set of subspaces is substituted for the general dataset of the problem and clustering is done using this set. This article introduces the Shared Subscribe Hyper Simulation Optimization (SUBHSO) algorithm to introduce the optimized cluster centres to a set of subspaces. SUBHSO uses an optimization loop for modifying and optimizing the coordinates of the cluster centres with the particle swarm optimization (PSO) and the fitness function calculation using the Monte Carlo simulation. The case study on the big data of Iran electricity market (IEM) has shown the improvement of the defined fitness function, which represents the cluster cohesion and separation relative to other dimension reduction algorithms.

Highlights

  • Conventional data mining methods are not suitable for big data analysis since they pose serious challenges to a variety of distance measurements in a reasonable time

  • The correlation-based clustering methods are used for a set of non-correlated dimensions, in which clusters are created in a new space or its subspaces

  • The concentration is on the calculation of subspaces of data, which avoids computational complications without affecting the clustering accuracy

Read more

Summary

INTRODUCTION

Conventional data mining methods are not suitable for big data analysis since they pose serious challenges to a variety of distance measurements in a reasonable time. The other goal is to find a set of attributes that clearly reflect the similarity of data in a dataset To this end, many subspace clustering methods have developed to solve the problem of data analysis in a fulldata space. In all of these problems, the aim is to select a set of subspaces as an appropriate substitute for the whole dataset This category has a higher accuracy than other methods, but with a large amount of data, their precision is reduced because the subspaces cannot be good representation for the entire data. This article proposes a hybrid clustering algorithm, namely, Shared Subscribe Hyper Simulation Optimization (SUBHSO) This algorithm is proposed to solve the problem of limiting the subspace method in the analysis of large data. The rest of this paper includes model component (Section 2), Iran electricity market and big data (Section 3), data clustering with the proposed algorithm (Section 4), comparison of the proposed algorithm with predecessors in terms of execution (Section 5) and validation of the proposed algorithm (Section 6)

MODEL COMPONENTS
SUBHSO Algorithm
IRAN ELECTRICITY MARKET AND BIG DATA
DATA CLUSTERING WITH THE PROPOSED ALGORITHM
COMPARISON OF THE PROPOSED ALGORITHM WITH PREDECESSORS IN TERMS OF EXECUTION
VALIDATION OF THE PROPOSED ALGORITHM
CONCLUSION AND RECOMMENDATIONS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.