A methodology for evaluating feature selection and clustering methods with project-specific requirements

H Von Linde,O Riedel

doi:10.1080/00207543.2024.2384597

Abstract

This paper describes a methodology for ranking feature selection and clustering methods with user-specific preferences and taking data properties into account. For a better understanding of this paper, the developed methodology is referred to as the Two Machine Learning Procedures, Preferences and Properties (2ML3P) methodology. The 2ML3P methodology aims to support users from multiple domains, such as engineers, who have little expertise in machine learning (ML). It is also independent from the disciplinary core competencies of the manufacturer, with a strong focus on employability in small and mid-sized enterprises (SME). The foundation of the methodology to evaluate the combination of the two machine learning classes is described. It focuses on a range of feature selection and clustering methods, their limitations, and their challenges. The paper covers the concept phase by defining the inputs, such as the specific characteristics of machine learning classes or the properties of the production data and the user preferences. With applied methodologies such as the analytic hierarchy process (AHP) and the technique for order preference by similarity to ideal solution (TOPSIS), the preferences of the user as valid input are integrated. The scientific contribution of this methodology is the approach to include user preferences and specific data properties in the selection process of two ML methods. As digitalisation progresses, making data-driven decisions in the domains of production and logistics is a goal for many SMEs. This methodology can support a data-driven decision-aid model by providing a guided method, which requires relatively little ML knowledge on the part of the engineer. It allows the user(s) to select the best suited combination of ML methods for a clustering use case.

Full Text