Compared with obstacle avoidance in open environment, collision-free path planning for duct-enter task is often challenged by narrow and complex space inside ducts. For obstacle avoidance, redundant robot is usually applied for this task. The motion of redundant robot can be decoupled to end-effector motion and self-motion. Current methods for duct-enter task are not robust due to the difficulty to properly define the self-motion. This difficulty mainly comes from two aspects: the definition of distances from robot to obstacles and the fusion of multiple data. In this work, we adapt the ideas underlying the success of human to handling this kind tasks, variable optimization strategies and learning, for one robust path planner. Proposed planner applies reinforcement learning skills to learn proper self-motion and achieves robust planning. For achieving robust behavior, state-action planner is creatively designed with three especially designed strategies. Firstly, optimization function, the kernel part of self-motion, is considered as part of action. Instead of taking every joint motion, this strategy embeds reinforcement learning skills on self-motion, reducing the search domain to null space of redundant robot. Secondly, robot end orientation is taken into action. For duct-enter task, robot end link is the motion starter for exploring movement just like the snake head. The orientation of robot end link when passing through some position can be referred by following links. Hence the second strategy can accelerate exploring by reduce the null space to possible redundant robot manifold. Thirdly, path guide point is also added into action part. This strategy can divide one long distance task into several short distance tasks, reducing the task difficulty. After these creative designs, the planner has been trained with reinforcement learning skills. With the feedback of robot and environment state, proposed planner can choose proper optimization strategies, just like the human brain, for avoiding collision between robot body and target duct. Compared with two general methods, Virtual Axis method with orientation Guidance and Virtual Axis, experiment results show that the success rate is separately improved by 5.9% and 49.7%. And two different situation experiments are carried out on proposed planner. Proposed planner achieves 100% success rate in the situation with constant start point and achieves 98.7% success rate in the situation with random start point meaning that the proposed planner can handle the perturbation of start point and goal point. The experiments proves the robustness of proposed planner.