Automated Dataset Construction from Web Resources with Tool Kayur

Alexander Kohan,Cyrille Valentin Artho,Mitsuharu Yamamoto

doi:10.15803/ijnc.7.2_271

Abstract

Many text mining tools cannot be applied directly to documents available on web pages. There are tools for fetching and preprocessing of textual data, but combining them with the data processing tool into one working tool chain can be time consuming. The preprocessing task is even more labor-intensive if documents are located on multiple remote sources with different storage formats. In this paper, we propose the simplification of data preparation process for cases when data come from wide range of web resources. We developed an open-source tool, called Kayur, that greatly minimizes time and effort required for routine data preprocessing steps, allowing to quickly proceed to the main task of data analysis. The datasets generated by the tool are ready to be loaded into a data mining workbench, such as WEKA or Carrot2, to perform classification, feature prediction, and other data mining tasks.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automated Dataset Construction from Web Resources with Tool Kayur

Abstract

Talk to us

Similar Papers

More From: International Journal of Networking and Computing

Lead the way for us

Journal: International Journal of Networking and Computing	Publication Date: Jan 1, 2017
License type: free

Similar Papers

Automated Dataset Construction from Web Resources with Tool Kayur
Alexander Kohan ... Cyrille Artho
-
Alexander Kohan, et. al.Alexander Kohan ... Cyrille Artho
01 Nov 2016
01 Nov 2016

Data Mining Driven Models for Diagnosis of Diabetes Mellitus: A Survey
F S Ishaq ... Y Atomsa
Indian Journal of Science and Technology | VOL. 11
F S Ishaq, et. al.F S Ishaq ... Y Atomsa
01 Nov 2018
Indian Journal of Science and Technology | VOL. 11

A Variety of Text Mining Technology and Tools Research
Jie Lian ... Zhili Pei
-
Jie Lian, et. al.Jie Lian ... Zhili Pei
01 Jan 2014
01 Jan 2014

Data Mining Classification Techniques for Human Talent Forecasting
Hamidah Jantan ... Zulaiha Ali
-
Hamidah Jantan, et. al.Hamidah Jantan ... Zulaiha Ali
21 Jan 2011
21 Jan 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automated Dataset Construction from Web Resources with Tool Kayur

Abstract

Talk to us

Similar Papers

More From: International Journal of Networking and Computing