Fluorescent Penetrant Inspection (FPI) is a Non-Destructive Testing (NDT) method, extensively used to evaluate components for identifying defects across a broad range of industries. FPI process remains a manual visual inspection, where the operator by means of fluorescent dye, that penetrates discontinuities on the component, aims to distinguish between indications that are relevant (i.e., can be associated with surface defects) and non-relevant (i.e., can be associated to insufficient wash-off, dust or other non-relevant factors). The FPI process can be decomposed into the following steps: (a) thorough visual examination of the component, (b) executing manual wiping-off of the fluorescent dye with a brush, of all areas that require interrogation for potential indications, and (c) disposition of the inspected components. The number of those areas on the component that require interrogation and hence to be wiped-off is unknown a priori of the inspection and varies depending on the condition of the part. As a result, replacing this manual wipe-off step by a robot requires tedious manual programming of an excessive number of robot paths to assure reach of the robot to the entire surface of the part as well as safe robot motion. In addition, these robot motions are part specific and thus not transferrable to other geometries of components, making scaling of this technology across manufacturing industry not possible. In this paper, we propose a hierarchical robot learning method to address the challenge of reducing manual robot programming and enable the scaling of this automated NDT technology. The proposed method integrates and fuses Deep Reinforcement Learning (DRL), Screw Linear Interpolation (ScLERP) and Learning from Demonstration (LfD), enabling an autonomous generation of brushing strokes with a six-degrees of freedom (DoF) industrial manipulator, and automating the wipe-off step of the FPI process. Using this approach, a robot learning policy is generated for the wipe-off motion in a simulated industrial robotic cell at first and then the policy is transferred to the real system for validation.