This paper addresses the significance of preprocessing big data collected during a tunnel boring machine (TBM) excavation before it is used for machine learning on various TBM performance predictions. The research work is based on two water diversion tunneling projects that cover 29.52 km and 17 051 boring cycles. It has been found that the penetration rate calculated from the raw measured penetration distances exhibits more random behavior owing to their percussive and vibratory behavior of the cutterhead. A moving average method to process the negative instantaneous velocities and a noise reduction filter to deal with signals with abnormal frequencies have been recommended. An index called the drilling efficiency index is introduced to assess the relationships between the mechanical parameters in a boring cycle, whose linear regression coefficient R2 is taken for a preliminary investigation of possible problems requiring preprocessing. The research work defines the irrelevant data whose errors are caused by human or mechanical mistakes, and therefore should be cleaned or amended. These irrelevant data can be divided into five categories: (1) premature cycles, (2) sensor defects, (3) mechanical defects, (4) human interruption, and (5) missing files. A program TBM-Processing has been coded for the recognition and classification of these categories. PDF books generated by the program have been uploaded at GitHub to encourage discussions, collaboration, and upgrading of the data processing work with our peers.
Read full abstract