Software Fault Prediction With an Iterative Fuzzy Logic System Considering Interpretability With Imbalanced Datasets
Users expect software to be error‐free; however, preventing faults in software while being developed is difficult. Although predicting faults in software is arduous, it radically helps to improve the software quality. Due to the complexity of software, time, and budget limitations, such prediction helps to deliver more robust and error‐free software with lower expenses. This paper introduces an iterative method based on fuzzy systems and machine learning to predict software faults. High interpretability, transparency, balancing data, and finding the best interval for converting numerical features to fuzzy features are basic challenges for predicting software faults. The proposed framework is split into four phases. In the first phase, the crisp inputs are converted to fuzzy sets. In the second phase, a membership function is constructed using triangular fuzzy sets. In the third phase, training data are balanced, and fuzzy rules are generated. In the last phase, the similarity of inputs with the rules’ antecedents is calculated, and the fired rules are aggregated to label the test data. Eclipse, Promise, and Travis repositories are evaluated with the proposed method. The calculated AUC of the proposed method on Promise, Travis, and Eclipse datasets are, respectively, equal to 89%, 62% and 87%, which are comparable to the results obtained by deep learning methods but with higher interpretability and transparency.
- Research Article
39
- 10.1016/j.knosys.2014.04.047
- Jun 1, 2014
- Knowledge-Based Systems
Modeling of a semantics core of linguistic terms based on an extension of hedge algebra semantics and its application
- Conference Article
- 10.2514/6.2004-5003
- Jun 22, 2004
The fuzzy logic controller based on fuzzy set theory provides a useful tool for converting the linguistics control strategy from the expert knowledge into automatic control rules. However, systematic tuning methods for fuzzy logic controllers have remained still under investigation. Usually FLC elements are tuned by trial and error method. In this paper, a new systematic method based on the optimization theory is used to tune the FLC scale factors. The objective functions based on the deviation from set point in the time response are optimized to achieve this goal. Simulation examples of aircraft longitudinal approach control are presented to illustrate the proposed method. INTRODUCTION For complex control problems, the fuzzy control algorithm can be obtained without a mathematical model of the plant, but the main disadvantage of using FLC seems to be the lack of systematic procedure for the design of FLC. The general method for designing FLC is to use trial and observation. No useful mathematical tool has yet been developed for the design of FLC because of its fuzziness, complexity and nonparameterization. There are three significant elements that have notable influence on the behavior of an FLC: 1)The control rule expressed in linguistic language, 2)The membership functions defined for fuzzy sets, and 3)The scale factors attached to the input and the output. Many researchers have investigated the tuning of FLC using these elements. Most of them concentrate on tuning FLC by adjusting control rules, the tuning of these rules is done by introducing self organizing fuzzy logic controller that can change the rules with respect to the process under control and its environment. The influence of scale factor values on the system response have been investigated by many researchers. However, proper control rules cannot always easily be obtained and suitable scale factors cannot always be achieved. Therefore, fixed fuzzy rules based on analyzing the behavior of a controlled process are used. For designing this kind of fuzzy logic controller, the tuning of fuzzy logic controller scale factors is considered one of the most important steps, especially when FLC output and. _________________________________ * AIAA member, Assistant Professor input parameters have the same range for the universe of discourse. In this paper, an optimization method will be used to find the optimal values of FLC scale factors. These values are achieved by minimizing proposed objective functions. These functions measure the deviation from the desired set point in different forms, since the deviation is related to scale factor selected values, the objective function form is explicitly function of deviation and implicitly is function of scale factors. The scale factors of the controller inputs and output may have been initially selected to have arbitrally initial values. FUZZY LOGIC CONTROLLER STRUCTURE The principle approach to the derivation of fuzzy control rules, in this research, is based on system response of the process to be controlled as shown in Fig. (1), where the input variables of the FLC are the error E and the change of error CE. While, the FLC output is the change of process input U. Fig. (1) System time response of a regulating system The input universe of discourse for the tracking error E or derivation error CE is divided into 7 degrees connected with the number of fuzzy sets by membership functions. In this study, E and CE can each range from –3 to +3 and the seven degrees are -3, -2, -1, 0, +1, +2, +3 and the fuzzy sets are defined as ( NB Negative Big, NM Negative Medium, NS Negative Small, AZ Zero, PS Positive Small, PM Positive Medium, PB Positive Big ). A similar analysis is given to the output for the control action, which uses the same fuzzy sets for the same universe of discourse. The fuzzy rules used to implement the controller are the standard IF.....THEN American Institute of Aeronautics and Astronautics 1 AIAA Guidance, Navigation, and Control Conference and Exhibit 16 19 August 2004, Providence, Rhode Island AIAA 2004-5003 Copyright © 2004 by the American Institute of Aeronautics and Astronautics, Inc. All rights reserved. type. The controller rule base is consisted of 49 rules of the form: IF (E is NS) AND (CE is NM) THEN (U is NB) ¦ ¦ ¦ ¦ ¦ ¦ IF (E is PM) AND (CE is NS) THEN (U is PS) The membership functions for fuzzy rules are illustrated in Fig. (2) and the prototype of fuzzy control rules is tabulated in Table (1). -3 -2 -1 0 1 2 3 NB NM NS PM PB
- Research Article
22
- 10.4236/jsip.2011.24036
- Jan 1, 2011
- Journal of Signal and Information Processing
In recent years, the use of Fuzzy set theory has been popularised for handling overlap domains in control engineering but this has mostly been within the context of triangular membership functions. In actual practice however, such domains are hardly triangular and in fact for most engineering applications the membership functions are usually Gaussian and sometimes cosine. In an earlier paper, we derived explicit Fourier series expressions for systematic and dynamic computation of grade of membership in the overlap and non-overlap regions of triangular Fuzzy sets. In another paper, we extended the methodology to cover cases of cosine, exponential and Gaussian Fuzzy sets by presenting explicit Fourier series representation for encoding fuzziness in the overlap and non-overlap domains of Fuzzy sets. This current paper presents the development of a “Fuzzy Controller” device, which incorporates the formal mathematical representation for computing grade of membership of Gaussian and triangular Fuzzy sets. It is shown that triangular approximation of Gaussian membership function in Fuzzy control can lead to wrong linguistic classification which may have adverse effects on operational and control decisions. The development of the Fuzzy controller demonstrates that the proposed technique can indeed be incorporated in engineering systems for dynamic and systematic computation of grade of membership in the overlap and non-overlap regions of Fuzzy sets; and thus provides a basis for the design of embedded Fuzzy controller for mission critical applications.
- Research Article
4
- 10.1007/s00500-004-0403-6
- Aug 11, 2004
- Soft Computing
A fuzzy controller has many degrees of freedom in terms of its component selection (e.g., different types of fuzzy sets, and different kinds of fuzzy rules). Consequently, linear or piecewise linear controllers can be unconsciously resulted, which is undesirable, as it is irrational to use fuzzy control that way. Fuzzy controllers should be used, and taken advantage of, as nonlinear controllers only, not as linear or piecewise linear controllers. Currently, there exist no rigorous methods, analytical or otherwise, to precisely determine whether a fuzzy controller designed is nonlinear or not. In the present paper, we establish conditions under which linearity, piecewise linearity or nonlinearity of a general class of Mamdani fuzzy controllers can be determined. These fuzzy controllers can use input fuzzy sets of any types, arbitrary fuzzy rules, arbitrary singleton output fuzzy sets, arbitrary inference methods, either Zadeh or the product fuzzy logic AND operator, and the centroid defuzzifier. We prove that the fuzzy controllers using Zadeh AND operator are always nonlinear, regardless of choice of the other components. The general fuzzy controllers using the product AND operator are also always nonlinear except when all input fuzzy sets are triangular or trapezoidal and a couple of other conditions are satisfied. The exceptions lead to piecewise linear or linear controllers. A concrete example is provided to illustrate the results. Our new findings provide much-needed insight to connections between the components and nonlinearity of the fuzzy controllers. They enable fuzzy control developers to correctly choose appropriate types and configurations of the components (e.g., triangular fuzzy sets instead of Gaussian ones) at the beginning of design stage, saving design time and effort.
- Book Chapter
1
- 10.1007/978-981-19-8012-1_1
- Jan 1, 2023
Traditional statistical learning algorithms perform poorly in case of learning from an imbalanced dataset. Software defect prediction (SDP) is a useful way to identify defects in the primary phases of the software development life cycle. This SDP methodology will help to remove software defects and induce to build a cost-effective and good quality of software products. Several statistical and machine learning models have been employed to predict defects in software modules. But the imbalanced nature of this type of datasets is one of the key characteristics, which needs to be exploited, for the successful development of a defect prediction model. Imbalanced software datasets contain non-uniform class distributions with most of the instances belonging to a specific class compared to that of the other class. We propose a novel hybrid model based on Hellinger distance-based decision tree (HDDT) and artificial neural network (ANN), which we call as hybrid HDDT-ANN model, for analysis of software defect prediction (SDP) data. This is a newly developed model which is found to be quite effective in predicting software bugs. A comparative study of several supervised machine learning models with our proposed model using different performance measures is also produced. Hybrid HDDT-ANN also takes care of the strength of a skew-insensitive distance measure, known as Hellinger distance, in handling class imbalance problems. A detailed experiment was performed over ten NASA SDP datasets to prove the superiority of the proposed method.
- Research Article
39
- 10.1016/s0005-1098(03)00086-4
- May 21, 2003
- Automatica
A general technique for deriving analytical structure of fuzzy controllers using arbitrary trapezoidal input fuzzy sets and Zadeh AND operator
- Research Article
100
- 10.1007/s10515-016-0194-x
- Mar 22, 2016
- Automated Software Engineering
Software defect prediction can automatically predict defect-prone software modules for efficient software test in software engineering. When the previous defect labels of modules are limited, predicting the defect-prone modules becomes a challenging problem. In static software defect prediction, there exist the similarity among software modules, a software module can be approximated by a sparse representation of the other part of the software modules, and class-imbalance problem, the number of defect-free modules is much larger than that of defective ones. In this paper, we propose to use graph based semi-supervised learning technique to predict software defect. By using Laplacian score sampling strategy for the labeled defect-free modules, we construct a class-balance labeled training dataset firstly. And then, we use a nonnegative sparse algorithm to compute the nonnegative sparse weights of a relationship graph which serve as clustering indicators. Lastly, on the nonnegative sparse graph, we use a label propagation algorithm to iteratively predict the labels of unlabeled software modules. We thus propose a nonnegative sparse graph based label propagation approach for software defect classification and prediction, which uses not only few labeled data but also abundant unlabeled ones to improve the generalization capability. We vary the size of labeled software modules from 10 to 30 % of all the datasets in the widely used NASA projects. Experimental results show that the NSGLP outperforms several representative state-of-the-art semi-supervised software defect prediction methods, and it can fully exploit the characteristics of static code metrics and improve the generalization capability of the software defect prediction model.
- Research Article
6
- 10.1016/j.ijar.2018.10.002
- Oct 9, 2018
- International Journal of Approximate Reasoning
A fast analytical approximation type-reduction method for a class of spiked concave type-2 fuzzy sets
- Book Chapter
3
- 10.1007/978-3-030-00211-4_2
- Aug 30, 2018
A software defect is a mistake in a computer program or system that causes to have incorrect or unexpected results, or to behave in unintended ways. Machine learning methods are helpful in software defect prediction, even though with the challenge of imbalanced software defect distribution, such that the non-defect modules are much higher than defective modules. In this paper we introduce an enhancement for the most resent hybrid SMOTE-Ensemble approach to deal with software defects problem, utilizing the Cost-Sensitive Learner (CSL) to improve handling imbalanced distribution issue. This paper utilizes four public available datasets of software defects with different imbalanced ratio, and provides comparative performance analysis with the most resent powerful hybrid SMOTE-Ensemble approach to predict software defects. Experimental results show that utilizing multiple machine learning techniques to cope with imbalanced datasets will improve the prediction of software defects. Also, experimental results reveal that cost-sensitive learner performs very well with highly imbalanced datasets than with low imbalanced datasets.
- Research Article
51
- 10.1007/s00500-017-2726-0
- Jul 17, 2017
- Soft Computing
This article deals with a triangular dense fuzzy set having special property on Cauchy sequence. In this set, the normality will never be attained unless we unlock by a special key on triangular dense fuzzy set at its final defuzzified state. We give several definitions on triangular dense fuzzy lock sets first and then discuss its locking unlocking property for single-key, double-key, and multiple keys environments with special reference to the convergence of Cauchy sequence. The non-membership function of the proposed lock set has also been studied. The graphical representations of the (non-)membership functions are developed, and the defuzzifications are done by existing methods of dense fuzzy sets as well as cloudy fuzzy sets implicitly. However, we have extended this fuzzy lock set into fuzzy lock matrix to generalize the concept. Finally, we discuss the fields of its practical application and draw a conclusion for better motivation.
- Conference Article
5
- 10.1109/ubmk.2018.8566479
- Sep 1, 2018
Fuzzy rule base systems are expert systems rely on fuzzy set theory. Here the knowledge of human expert is transfered to the artificial model via fuzzy rules. Therefore, preciseness, completeness and coverage of fuzzy rules in a fuzzy system is vital for the accuracy and plausibility of fuzzy reasoning. However, in such cases where the human expert is unable to supply the rules sufficiently, data-based automatic rule generation methods attract attention. In this study, 2 linear and 2 evolutionary approaches of automatic fuzzy rule generation methods are investigated. The investigated linear solutions contain Wang-Mendel Method and E2E-HFS, while MOGUL and IVTURS-FARC are the selected evolutionary approaches. Wang-Mendel and MOGUL is commonly considered as basic methods of the group they belong to. IVTURS-FARC is distinguished with its ability to handle interval valued fuzzy sets. Among the rest of the algorithms, E2E-HFS is unique with its weak dependency to data. Because it only use some simple properties of corresponding input variable. In order to compare the completeness and the accuracy of automatically generated fuzzy rules, several experiments are performed on different software defect prediction datasets, and the classification performance of resulting fuzzy systems is evaluated. Provided results show that even if training of evolutionary approaches seem to be more precise, similar accuracy can be achieved by linear approaches, and they perform better regarding the experiments on unseen data.
- Conference Article
15
- 10.1109/fuzzy.1995.409928
- Mar 20, 1995
In this paper, we propose a genetic algorithm based method for adjusting the membership functions of antecedent fuzzy sets in fuzzy rules for classification problems. The proposed method determines the fuzzy partition of a pattern space for a classification problem. This means that the number of fuzzy rules and the membership function of each antecedent fuzzy set are simultaneously determined. First we describe how a fuzzy partition of a pattern space is denoted by a string that can be handled in genetic algorithms. In this coding, each axis of a pattern space is partitioned by triangular fuzzy sets and trapezoid fuzzy sets. This coding can also employ the whole domain of each attribute as an antecedent fuzzy set. Next, we show genetic operators for adjusting the membership function of each antecedent fuzzy set. Finally, we demonstrate that our genetic algorithm can construct a classification system with high classification power. >
- Research Article
7
- 10.32629/jai.v6i1.559
- Jun 16, 2023
- Journal of Autonomous Intelligence
Software defect prediction (SDP) is an essential task for developing quality software, and various models have been developed for this purpose. However, the imbalanced nature of software defect datasets has challenged these models, resulting in decreased performance. To address this challenge, the author has proposed a hybrid machine learning model that combines Synthetic Minority Oversampling Technique (SMOTE) with Support Vector Machine (SVM)—SMOTE-SVM (S-SVM) model. The author has empirically examined SDP using multiple datasets (CM1, PC1, JM1, PC3, KC1, EQ and JDT) from the PROMISE and AEEEM repositories. The experimental study indicates that the S-SVM model involved training and compared with previously developed balanced and imbalanced test datasets using four evaluation metrics: Precision, Recall, F1 score, and Accuracy. For the balanced dataset, the S-SVM model achieved precision values ranging from 70 to 96, recall values ranging from 52 to 94, F1-score values ranging from 67 to 90, and accuracy values ranging from 69 to 98. For the imbalanced dataset, the S-SVM model achieved precision values ranging from 60 to 93, recall values ranging from 64 to 97, F1-score values ranging from 69 to 91, and accuracy values ranging from 67 to 87. The proposed S-SVM model outperforms other models’ ability to classify and predict software defects. Therefore, the hybridisation of SMOTE and SVM improved the model’s ability to categories and predict balanced and imbalanced datasets when sufficient defective and non-defective data is provided.
- Research Article
114
- 10.1016/j.neucom.2018.04.090
- Feb 4, 2019
- Neurocomputing
An empirical study to investigate oversampling methods for improving software defect prediction using imbalanced data
- Research Article
- 10.30574/ijsra.2024.12.2.1518
- Aug 30, 2024
- International Journal of Science and Research Archive
Software defects and quality assurance are crucial aspects of software development that should be considered during the software development cycle. To ensure high-quality software, it is essential to have a robust quality assurance process in place. System reliability and quality are very key components that must be considered during software development, and this can only be achieved when software undergoes a thorough test process for errors, anomalies, defects, omissions, and bugs. Early software defect prediction and detection play an essential role in ensuring the reliability and quality of software systems, ensuring that software companies discover errors or defects early enough and allocate more resources to defect-prone modules. This study proposes the development of an enhanced classifier model for software defect prediction and detection. The aim is to harness the collective intelligence of selected base classifiers like Support Vector Machine, Logistic regression, Decision Trees, Random Forest, AdaBoost, Gradient Boosting, K-Nearest Neighbor, GaussianNB, and Multi-Layer Perception to improve accuracy, robustness, and generalization in identifying potential defects using a soft voting ensemble technique. The ensemble model leveraged the confidence probability of the soft voting technique and the generalization advantage of cross-validation leading to a more robust and dynamic model. The performance of the model with existing classifiers was evaluated using accuracy, F1 score, Precision, and area under the ROC curve (ROC- AUC) as the evaluation metrics. The results of the experiment revealed that the Proposed Classifier produced an overall Accuracy rate of 93%, and ROC AUC of 98%. The results demonstrate the effectiveness of our enhanced ensemble classifier in software defect detection and prediction. By harnessing the strengths of diverse base classifiers, our approach provides a robust and adaptive solution to the challenges of early detection and mitigating defects in software systems. This research contributes to the advancement of reliable software development practices and lays the foundation for future enhancements in ensemble-based defect detection methodologies.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.