Abstract

In this paper, Latin Hypercube Sampling (LHS) method is compared as per its effectiveness in supervised machine learning procedures. Employing LHS saves computer's processing time and in conjunction with Latin hypercube design properties and space filling ability, is considered as one of the most advanced mechanisms in terms of sampling. Although more data usually deliver better results, when using LHS techniques, same quality outputs can be produced with less data and, as a result, storage cost and training time are reduced. Conditioned Latin Hypercube Sampling (cLHS) is one of those techniques, successfully performing in supervised machine learning tasks. Unfortunately, the minimum sufficient training dataset size cannot be known in advance. In this case, progressive sampling is recommended since it begins with a small sample and progressively increases its size until model accuracy no longer improves. Combining Latin hypercube sampling and the idea of sequentially incrementing sampling, we test Progressive Latin Hypercube Sampling (PLHS) while monitoring the performance of the sampling-based training as the sample size grows. PLHS and cLHS algorithms are applied in datasets with discrete variables securing that each sample is provided with the Latin hypercube design properties and preserves the principal ability of LHS for space filling, as illustrated in respective sample projecting diagrams. The performance of the above LHS methods in supervised machine learning is evaluated by the degree of training of the model, which is certified through the accuracy of the produced confusion matrices in test files. The results from the use of the above Latin Hypercube Sampling techniques compared against benchmark sampling method empirically prove that machine learning training process becomes less costfull, while remaining reliable.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call