Automatic clustering of construction project documents based on textual similarity

Mohammed Al Qady,Amr Kandil

doi:10.1016/j.autcon.2014.02.006

Abstract

Abstract Text classifiers, as supervised learning methods, require a comprehensive training set that covers all classes in order to classify new instances. This limits the use of text classifiers for organizing construction project documents since it is not guaranteed that sufficient samples are available for all possible document categories. To overcome the restriction imposed by the all-inclusive requirement, an unsupervised learning method was used to automatically cluster documents together based on textual similarities. Repeated evaluations using different randomizations of the dataset revealed a region of threshold/dimensionality values of consistently high precision values and average recall values. Accordingly, a hybrid approach was proposed which initially uses an unsupervised method to develop core clusters and then trains a text classifier on the core clusters to classify outlier documents in a consequent refinement step. Evaluation of the hybrid approach demonstrated a significant improvement in recall values, resulting in an overall increase in F-measure scores.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic clustering of construction project documents based on textual similarity

Abstract

Talk to us

Similar Papers

More From: Automation in Construction

Lead the way for us

Journal: Automation in Construction	Publication Date: Mar 15, 2014
Citations: 47

Similar Papers

Unsupervised learning on scientific ocean drilling datasets from the South China Sea
...
Frontiers of Earth Science | VOL. 13
, et. al. ...
04 Jun 2018
Frontiers of Earth Science | VOL. 13

Unsupervised star, galaxy, QSO classification
C H A Logan ... S Fotopoulou
Astronomy & Astrophysics | VOL. 633
C H A Logan, et. al.C H A Logan ... S Fotopoulou
01 Jan 2020
Astronomy & Astrophysics | VOL. 633

Anomaly Detection of Laser Powder Bed Fusion Melt Pool Images Using Combined Unsupervised and Supervised Learning Methods
Vivian Wen Hui Wong ... Matthew M Sato
-
Vivian Wen Hui Wong, et. al.Vivian Wen Hui Wong ... Matthew M Sato
14 Aug 2022
14 Aug 2022

Neural speech enhancement with unsupervised pre-training and mixture training
Xiang Hao ... Lei Xie
Neural Networks | VOL. 158
Xiang Hao, et. al.Xiang Hao ... Lei Xie
17 Nov 2022
Neural Networks | VOL. 158

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic clustering of construction project documents based on textual similarity

Abstract

Talk to us

Similar Papers

More From: Automation in Construction