Abstract
Big Data allied to the Internet of Things nowadays provides a powerful resource that various organizations are increasingly exploiting for applications ranging from decision support, predictive and prescriptive analytics, to knowledge extraction and intelligence discovery. In analytics and data mining processes, it is usually desirable to have as much data as possible, though it is often more important that the data is of high quality thereby two of the most important problems are raised when handling large datasets: and feature selection. This paper addresses the problem and presents a heuristic method to find the sampling of big datasets. The concept of the critical size of a dataset D is that there is a minimum number of samples of D that is required for a given data analytic task to achieve satisfactory performance. The problem is very important in data mining, as the size of data sets directly relates to the cost of executing the data mining task. Since the problem of determining the critical size is intractable, in this paper we study heuristic methods to find the critical sampling. Several datasets were used to conduct experiments using three versions of the heuristic method for evaluation. Preliminary results obtained have shown the existence of an apparent critical size for all the datasets being tested, which is generally much smaller than the size of the whole dataset. Further, the proposed heuristic method provides a practical solution to find a useful critical for data mining tasks.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.