Abstract

Digital images are extensively used to increase the accuracy and timeliness of progress reports, safety training, requests for information (RFIs), productivity monitoring, and claims and litigation. While these images can be sorted using date and time tags, the task of searching an image dataset for specific visual content is not trivial. In pattern recognition, generating metadata tags describing image contents (objects, scenes) or appearance (colors, context) is referred to as multi-label image annotation. Given the large number and diversity of construction images, it is desirable to generate image tags automatically. Previous work has applied pattern matching to synthetic images or images obtained from constrained settings. In this paper, we present deep learning (particularly, transfer learning) algorithms to annotate construction imagery from unconstrained real-world settings with high fidelity. We propose convolutional neural network (CNN)-based algorithms which take RGB values as input and output the labels of detected objects. Particularly, we have investigated two categories of classification tasks: single-label classification, i.e., a single class (among multiple predefined classes) is assigned to an image, and multi-label classification, i.e., a set of (one or more) classes is assigned to an image. For both cases, the VGG-16 model, pre-trained on the ImageNet dataset, is trained on construction images retrieved with web mining techniques and labeled by human annotators. Testing the trained model on previously unseen photos yields an accuracy of ~90% for single-label classification and ~85% for multi-label classification, indicating the high sensitivity and specificity of the designed methodology in reliably identifying the contents of construction imagery.

Highlights

  • Construction site imagery is valuable for creating progress reports, requests for information (RFIs), safety training, productivity monitoring, and claims and litigation

  • The designed convolutional neural network (CNN) is applied to Pictor v.1.0 and Pictor v.1.1 datasets and the results are demonstrated in the following Subsections

  • The CNN model (VGG-16) takes an RGB image as input, generates intermediate features through a series of convolution and max-pooling operations, passes the features to the fully-connected layer, and outputs the probabilities of the image belonging to each class

Read more

Summary

Introduction

Construction site imagery is valuable for creating progress reports, requests for information (RFIs), safety training, productivity monitoring, and claims and litigation. In the advent of digital cameras and more recently, drones, digital images can be readily captured from jobsites and used to increase the accuracy and timeliness of decisionmaking in construction. Captured images, despite being abundant, rarely contain rich metadata other than date, time, and (in some cases) location information. Retrieving desired information or specific visual content from a large image collection may turn into a non-trivial, resource-intensive task that can only be completed manually. A potential remedy to this problem is to create a semantic structure for the image collection, for instance by using metadata tags describing content (e.g., objects, scenes) and appearance (e.g., color, context). Given the large number and diversity of construction site images, manual tagging is timeconsuming and effortful, rendering the automatic generation of metadata an appealing solution

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.