Abstract
Data management systems are essential elements for any organization which is dealing with large volume of data now a days. Due to increase in data volume, and its complexities, it has become more challenging job for workload management system to maintain its performance. So, there is a need of such a system that can autonomically deal with such complexities with less or without human involvement. Performance of these systems can be improved by making the systems well-aware about the workload entering into the system. The workload of a prevalent typical database and data warehouse system can be characterized into three types that is Online Transaction Processing (OLTP), Decision Support Systems (DSS) and Mixed type of workload. Currently, autonomic characterization of workload into a binary class such as OLTP and DSS is being carried out as reported in the literature, however, characterizing the workload into three types that refers to a multi-class classification problem is relatively a more challenging task. In this study, we propose a novel optimized Case-based Reasoning (CBR) approach based on clustering for autonomically characterizing the workload into multi-class types before entering into the system. We implement four phases of CBR along with case-base generation and map it to the elements of autonomic MAPE-K model. In Retrieve phase, k-means clustering is used for enhancing retrieval efficiency and workload types predictions are made in Reuse phase. Genetic Algorithm is used in Revise and Adapt phase of CBR. Few autonomic self_* characteristics are incorporated to make it autonomic. We performed various experiments and results show that the proposed model outperforms in prediction as compared to existing approaches. We performed post-hoc test for the validation of results in comparison with other machine learning classifiers using the Friedman test that show that the proposed model stands out as the best classifier.
Highlights
With the increase in data volume, the complexity of data increases, which results in the increase of difficulties regarding the data management
MySQL database contains more than 500 variables which records system performance, but all are not important, so this study is based upon six features selected provided by the feature selection method
It characterizes the workload into three types that includes Online Transaction Processing (OLTP), Decision Support Systems (DSS), and Mixed, referring to a multi-class classification problem
Summary
With the increase in data volume, the complexity of data increases, which results in the increase of difficulties regarding the data management. Data management is getting beyond the human capability and encourages the development of intelligent systems. The associate editor coordinating the review of this manuscript and approving it for publication was Rashid Mehmood. Administrator (DBA) is to manage the database activities which includes the handling of data. Due to dynamic and complex nature of data, humans cannot handle it in an efficient way. There is a need to develop intelligent systems with self-management capabilities for data handling
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have