Abstract

Regression modeling of a small data set is an important task in many industries. It is critical in medicine in cases of insufficient historical data to conduct effective intellectual analysis. The small number of modern approaches to solving this problem do not always provide satisfactory results. In addition, the existing tools that are either very simple, which can lead to erroneous predictions, or tools that are quite complex, which can lead to problems such as overfitting, create the need to select a large number of additional parameters, the definition of distribution laws for data augmentation techniques, and so on. This chapter proposes universal intraensemble methods to improve the accuracy of processing short sets of medical data. The main idea of the method is a combination of data augmentation and elements of ensemble learning. This approach improves the generalization properties of the method and therefore increases the prediction accuracy. The developed method is based on the use of arbitrary nonlinear computational intelligence tools. The authors present various algorithmic implementations of the proposed method using both machine learning algorithms and artificial neural networks. Experimental modeling was performed on two short data sets from different fields of medicine. High prediction accuracy was established with a significant increase in the training time of each of the algorithms due to both a significant increase in the sample size and the need to process double the number of inputs. The increase in the prediction accuracy for each of the developed algorithms in comparison with the basic tools, which are the basis of the work of each specific algorithmic implementation of the method, has been experimentally established. The method can operate with very short data samples. In addition, it is scaled by the use of a large number of different nonlinear computational intelligence tools that are based on the principles of machine learning. The method can be applied in various fields for which it is necessary to solve classification or regression tasks based on short and very short data sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call