Abstract

Within this thesis we compare the KNIME data mining tools and the graphical environment Knowledge Flow of the WEKA in a theoretical context but also experimentally in order to find a model for predicting the duration of digitization of archival material (files) of the company Archeiothiki S.A.”. The technique used to create the prediction model is the regression technique based on the KNN, SVM, Random Forest, Decision Tree and Linear Regression algorithms in a set of data from the company itself. According to our experimental results, WEKA and KNIME provide equally good prediction results with WEKA having more algorithms for this particular mining technique. KNIME provides a more useful, instinctive/intuitive user interface, meaning the user is able to use the workflow quickly and easy, without consciously thinking about how to do it, so that the understanding of the flow is appropriate and for more novice users. The results may differ depending on the application of different algorithms· but our findings showed that the Random Forest and Decision Tree algorithms gave the best results based on features such as user, weeks, number of documents and number of pages of each folder.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call