Organizations are vulnerable to cyber attacks as they rely on computer networks and the internet for communication and data storage. While Reinforcement Learning (RL) is a widely used strategy to simulate and learn from these attacks, RL-guided offensives against unknown scenarios often lead to early exposure due to low stealth resulting from mistakes during the training phase. To address this issue, this work evaluates if the use of Knowledge Transfer Techniques (KTT), such as Transfer Learning and Imitation Learning, reduces the probability of early exposure by smoothing mistakes during training. This study developed a laboratory platform and a method to compare RL-based cyber attacks using KTT for unknown scenarios. The experiments simulated 2 unknown scenarios using 4 traditional RL algorithms and 4 KTT. In the results, although some algorithms using KTT obtained superior results, they were not so significant for stealth during the initial epochs of training. Nevertheless, experiments also revealed that throughout the entire learning cycle, Trust Region Policy Optimization (TRPO) is a promising algorithm for conducting cyber offensives based on Reinforcement Learning.