Self-supervised blind-mask denoising networks overcome the challenge of requiring clean training targets by employing a mask on raw noisy data to form the input for training while using the unmasked version as the network's target. The application of such networks has shown considerable strength in suppressing both random and coherent noise in seismic data. However, because such networks need to figure out the target (clean signal) on their own, they struggle at low signal-to-noise ratios. Seismic-while-drilling acquisitions result in seismic data of very low quality because the drilling operation introduces significant noise propagating from the rig site. Due to its consistent and low-frequency nature, it is hard to design a noise mask to hide the rig noise from the network without also hiding useful information required for predicting the signal. However, by reframing the task from a noise suppression to a noise prediction task and utilizing a mask to hide the signal from the network, the rig noise can be predicted accurately. Therefore, the difference between the network's prediction and the raw data results in common bit (equivalent to shot, but continuous) gathers with a significantly higher signal-to-noise ratio due to the removal of rig noise. Illustrated on six common bit gathers, this reversed methodology is shown to separate the rig noise and signal, even in their shared bandwidth. The additional use of explainable artificial intelligence is investigated as a means of avoiding the manual step of creating the signal mask, providing promising results. This study lays the ground work for suppression of high-amplitude, consistent noises, such as those arising from well site operations like fluid injection procedures for carbon sequestration or geothermal energy production purposes.