An Approach Towards Reducing Training Time of the Input Doubling Method via Clustering for Middle-Sized Data Analysis

Ivan Izonin,Roman Tkachenko,Kyrylo Yemets,Michal Gregus,Yevhen Tomashy,Iryna Pliss

doi:10.1016/j.procs.2024.08.007

Abstract

Intellectual analysis of small and middle-sized datasets through machine learning tools presents challenges in various application domains. Existing methods fail to provide sufficient accuracy, and their utilization is accompanied by a range of issues during data analysis. This paper proposes the improvement of the input doubling method for middle-sized data analysis. The existing method employs an augmentation procedure where the augmented data sample increases quadratically. This imposes several limitations on the method's usage for middle-sized data analysis. The authors propose enhancing this method by introducing an additional clustering procedure during data augmentation. The training algorithms and application methods are described, and a visualization of the main steps of its operation is provided. Modeling is performed on two medium-sized datasets. Optimal parameters for the improved method are selected, demonstrating its high efficiency. Specifically, significant reductions in the volumes of augmented datasets (8-9 times for both datasets respectively) are achieved, accompanied by substantial reductions in the training procedure duration of the method (more than 100 and 260 times for both datasets respectively), while maintaining high accuracy.

Full Text