Understanding quality of analytics trade-offs in an end-to-end machine learning-based classification system for building information modeling

Minjung Ryu,Hong-Linh Truong,Matti Kannala

doi:10.1186/s40537-021-00417-x

Minjung Ryu, Hong-Linh Truong + Show 1 more

Open Access

https://doi.org/10.1186/s40537-021-00417-x

Copy DOI

Journal: Journal of Big Data	Publication Date: Feb 15, 2021
Citations: 11	License type: open-access

Affiliation: Solpros (Finland), Aalto University

Abstract

Optimizing quality trade-offs in an end-to-end big data science process is challenging, as not only do we need to deal with different types of software components, but also the domain knowledge has to be incorporated along the process. This paper focuses on methods for tackling quality trade-offs in a common data science process for classifying Building Information Modeling (BIM) elements, an important task in the architecture, engineering, and construction industry. Due to the diversity and richness of building elements, machine learning (ML) techniques have been increasingly investigated for classification tasks. However, ML-based classification faces many issues, w.r.t. vast amount of data with heterogeneous data quality, diverse underlying computing configurations, and complex integration with industrial BIM tools, in an end-to-end BIM data analysis. In this paper, we develop an end-to-end ML classification system in which quality of analytics is considered as the first-class feature across different phases, from data collection, feature processing, training to ML model serving. We present our method for studying the quality of analytics trade-offs and carry out experiments with BIM data extracted from Solibri to demonstrate the automation of several tasks in the end-to-end ML classification. Our results have demonstrated that the quality of data, data extraction techniques, and computing configurations must be carefully designed when applying ML classifications for BIM in order to balance constraints of time, cost, and prediction accuracy. Our quality of analytics methods presents generic steps and considerations for dealing with such designs, given the time, cost, and accuracy trade-offs required in specific contexts. Thus, the methods could be applied to the design of end-to-end BIM classification systems using other ML techniques and cloud services.

Highlights

In the architecture, engineering, and construction (AEC) industry, Building Information Modeling (BIM) is a key technology for the digital transformation of the industry
We have presented our Quality of Analytics (QoA)-aware machine learning (ML)-based classification system for BIM models
Our extensive study with the integration with Solibri software product validated the average accuracy of the ML classification system

Summary

Introduction

In the architecture, engineering, and construction (AEC) industry, Building Information Modeling (BIM) is a key technology for the digital transformation of the industry. Despite an extensive use for grouping and filtering building elements throughout the entire stages of design, construction, and operation, a remaining critical problem in current classification techniques is that the classification does not always exhaustively assign all elements to the corresponding categories It may leave some elements unclassified or misclassified due to the several data qualityrelated risk factors in the AEC domain:. If the classification is incomplete or inaccurate, the issue checking in the quality assurance process cannot detect object clashing, or cost estimation can erroneously exclude certain elements For tackling such risk factors, domain knowledge must be incorporated into the data science processes. Understanding the trade-offs between factors of Quality of Analytics (QoA) [5, 6], such as quality of data, execution time, and resulting prediction accuracy, is of paramount importance for ML classification systems for BIM, especially because building models are heavily created by professionals through manual design-experiment tasks. "Conclusions and future work" section concludes the research and outlines future improvements

Background and related work

Findings

Conclusions and future work