Abstract

This research proposes an innovative framework that can be used as a preliminary data analysis tool when labels of data instances are not available during the early stage of the process. The preliminary data analysis usually starts from exploring “target interest” features, which can be the measures representing the performances or the decision attributes. Then, investigating the factors that are highly correlated with the “target interest” features is the major analysis task. Because no exact labels are provided, these data exploration and investigation processes are iterative and time-consuming, especially when the size of data is huge. This research proposes the framework, named NSGAII-SCC, to form the multi-objective problem of combining clustering for “target interest” exploration with a classification algorithm for factor investigation, sequentially. The fast and elitist non-dominated sorting genetic algorithm (NSGAII) integrated with a feature selection mechanism is designed to search for a better solution for clustering and classification. This sequential clustering and classification process aims to not only reveal the hidden patterns of “target interest” but also explore the features that are highly correlated with the discovered patterns. Two public transactional datasets from Kaggle were used to evaluate the performance of NSGAII-SCC. The experimental result shows that NSGAII-SCC achieves a promising performance for finding better solutions that maintain the multi-objectives of clustering and classification. Additionally, the feature selection using the chromosome settings can help to search for the relevant features for both clustering and classification learnings. The proposed framework is particularly useful as a tool to investigate big transactional data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.