Anomalies Detection in Software by Conceptual Learning From Normal Executions

Ahmad Qadeib Alban,Ali Jaoua,Fahad Islam,Qutaibah M Malluhi

doi:10.1109/access.2020.3027508

Abstract

Could we detect anomalies during the run-time of a program by learning from the analysis of its previous traces for normally completed executions? In this paper we create a featured data set from program traces at run time, either during its regular life, or during its testing phase. This data set represents execution traces of relevant variables including inputs, outputs, intermediate variables, and invariant checks. During a learning mining step, we start from exhaustive random training input sets and map program traces to a minimal set of conceptual patterns. We employ formal concept analysis to do this in an incremental way, and without losing dependencies between data set features. This set of patterns becomes a reference for checking the normality of future program executions as it captures invariant functional dependencies between the variables that need to be preserved during execution. During the learning step, we consider enough input classes corresponding to the different patterns by using random input selection until reaching stability of the set of patterns (i.e. the set is almost no longer changing, and only negligible new patterns are not reducible to it). Experimental results show that the generated patterns are significant in representing normal program executions. They also enable the detection of different executable code contamination at early stages. The proposed method is general and modular. If applied systematically, it enhances software resilience against abnormal and unpredictable events.

Highlights

Data science and machine learning methods offer new methods for extracting knowledge and reducing big data while preserving the underlying main concepts in a summarized form
In [16] we extended it to a multi-level conceptual data reduction approach for fuzzy formal contexts, based on the Lukasiewicz implication definition. ‘‘Let U be a set, called the universe of discourse
FOR ANOMALY DETECTION we start by explaining our investigation concerning how to discover anomalies by using a formal context that abstracts the knowledge about generated traces in the program

Summary

INTRODUCTION

Data science and machine learning methods offer new methods for extracting knowledge and reducing big data while preserving the underlying main concepts in a summarized form. We incrementally generate some knowledge K containing consistent reduced conceptual sets of patterns associated with several correct executions of the program. We propose a novel approach to detect run-time program anomalies at the earliest stages, by first learning from its normal completed executions. We collect the set of patterns K reflecting the normal behavior of a program This is built as the union of all conceptual patterns generated automatically by random inputs. These cases are used for evaluating our proposed approach.

STATE OF THE ART ABOUT ANOMALIES DETECTION IN PROGRAMS AND DATA

CONCEPTUAL PATTERN EXTRACTION FROM DATA

METHODOLOGY FOR ANOMALY DETECTION

VALIDATION OF THE METHOD AND ITS LIMIT

Findings

CONCLUSION