A Novel Optimized Case-Based Reasoning Approach With K-Means Clustering and Genetic Algorithm for Predicting Multi-Class Workload Characterization in Autonomic Database and Data Warehouse System

Nusrat Shaheen,Basit Raza,Ahmad Raza Shahid,Hani Alquhayz

doi:10.1109/access.2020.3000139

Nusrat Shaheen, Basit Raza + Show 2 more

Open Access

PDF Available

https://doi.org/10.1109/access.2020.3000139

Copy DOI

Export

Save

Cite

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 9	License type: CC BY 4.0

Affiliation: COMSATS University Islamabad, Majmaah University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Data management systems are essential elements for any organization which is dealing with large volume of data now a days. Due to increase in data volume, and its complexities, it has become more challenging job for workload management system to maintain its performance. So, there is a need of such a system that can autonomically deal with such complexities with less or without human involvement. Performance of these systems can be improved by making the systems well-aware about the workload entering into the system. The workload of a prevalent typical database and data warehouse system can be characterized into three types that is Online Transaction Processing (OLTP), Decision Support Systems (DSS) and Mixed type of workload. Currently, autonomic characterization of workload into a binary class such as OLTP and DSS is being carried out as reported in the literature, however, characterizing the workload into three types that refers to a multi-class classification problem is relatively a more challenging task. In this study, we propose a novel optimized Case-based Reasoning (CBR) approach based on clustering for autonomically characterizing the workload into multi-class types before entering into the system. We implement four phases of CBR along with case-base generation and map it to the elements of autonomic MAPE-K model. In Retrieve phase, k-means clustering is used for enhancing retrieval efficiency and workload types predictions are made in Reuse phase. Genetic Algorithm is used in Revise and Adapt phase of CBR. Few autonomic self_* characteristics are incorporated to make it autonomic. We performed various experiments and results show that the proposed model outperforms in prediction as compared to existing approaches. We performed post-hoc test for the validation of results in comparison with other machine learning classifiers using the Friedman test that show that the proposed model stands out as the best classifier.

Highlights

With the increase in data volume, the complexity of data increases, which results in the increase of difficulties regarding the data management
MySQL database contains more than 500 variables which records system performance, but all are not important, so this study is based upon six features selected provided by the feature selection method
It characterizes the workload into three types that includes Online Transaction Processing (OLTP), Decision Support Systems (DSS), and Mixed, referring to a multi-class classification problem

Summary

Introduction

With the increase in data volume, the complexity of data increases, which results in the increase of difficulties regarding the data management. Data management is getting beyond the human capability and encourages the development of intelligent systems. The associate editor coordinating the review of this manuscript and approving it for publication was Rashid Mehmood. Administrator (DBA) is to manage the database activities which includes the handling of data. Due to dynamic and complex nature of data, humans cannot handle it in an efficient way. There is a need to develop intelligent systems with self-management capabilities for data handling

Methods

Results

Conclusion