The step of identifying to which class of operational situation belongs the current environmental system (ES) situation is a key element to build successful environmental decision support systems (EDSS). This diagnosis phase is especially difficult due to multiple features involved in most environmental systems. It is not an easy task for environmental managers to acquire, to integrate and to understand all the increasing amount of data obtained from an environmental process and to get meaningful knowledge from it. Thus, a deeper classification task in a EDSS needs a full integration of gathered data, including the use of statistics, pattern recognition, clustering techniques, similarity-based reasoning and other advanced information technology techniques. Consequently, it is necessary to use automatic knowledge acquisition and management methods to build consistent and robust decision support systems. Additionally, some environmental problems can only be solved by experts who use their own experience in the resolution of similar situations. This is the reason why many artificial intelligence (AI) techniques have been used in recent past years trying to solve these classification tasks. Integration of AI techniques in EDSS has led to more accurate and reliable EDSS. Case-based reasoning (CBR) is a good technique to solve new problems based on previous experience. Main assumption in CBR relies on the hypothesis that similar problems should have similar solutions. When working with labelled cases, the retrieval step in CBR cycle can be seen as a classification task. The new cases will be labelled (classified) with the label (class) of the most similar case retrieved from the case base. In environmental systems, these classes are operational situations. Thus, similarity measures are key elements in obtaining a reliable classification of new situations. This paper describes a comparative analysis of several commonly used similarity measures, and a study on its performance for classification tasks. In addition, it introduces L’ Eixample distance, a new similarity measure for case retrieval. This measure has been tested with good accuracy results, which improve the performance of the classification task. The testing has been done using two environmental data sets and other data sets from the UCI Machine Learning Database Repository.
Read full abstract