Abstract

K-Nearest Neighbor (KNN) is considered as one of the simplest machine learning algorithms. While the implementation is quite simple, KNN is actually computationally expensive that makes it take a lot of time when it tries to predict. KNN has been known to be a lazy learning machine learning method that means that this method doesn’t generalize the data, instead it has to memorize the training data, even when testing. This paper aims to optimize the KNN classifier to solve page blocks classification by making the algorithm parallel. The part of the KNN algorithm that is changed to become parallel is the outer part where the task for each test data is divided according to the number of processors. In this work, we use parallel KNN to classify page blocks. Page blocks are any blocks of a page layout that are detected by using a segmentation technique, the KNN is trained to classify whether a block is a vertical line, picture, text, horizontal line or graphic. The experiment shows that the KNN classifier obtains an accuracy of 93.51% and by using parallel KNN, a speedup of 4.64 times faster and an efficiency of 57.96% can be obtained by using 8 processors and an increasing number of grids up to 6040 while it obtains the same accuracy as serial.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call