The utilization of directional antennas for neighbor discovery in wireless ad hoc networks brings notable benefits, such as extended transmission range, reduced transmission interference, and enhanced antenna gain. However, when nodes use directional antennas for neighbor discovery, the communication range is limited, resulting in a lack of knowledge of potential neighbors. Hence, it is necessary to design a special antenna direction switching strategy for neighbor discovery based on directional antennas. Traditional methods of switching antenna directions are often random or follow predefined sequences, overlooking the historical knowledge of sector exploration for antenna directions. In contrast, existing machine learning approaches aim to leverage observed historical knowledge to adjust antenna directions for faster neighbor discovery. Nonetheless, the latency of neighbor discovery is still high because the node cannot fully utilize the observed historical knowledge (i.e., only using the knowledge observed by the node in transmission mode, ignoring the knowledge observed by the node in reception mode). Meanwhile, the corresponding reward and penalty mechanisms are still not detailed enough (i.e., these reward and penalty mechanisms only consider the sectors of discovered and undiscovered neighboring nodes, ignoring the scenario of sectors that have been rewarded). In this paper, the neighbor discovery process is modeled as a reinforcement learning-based learning automaton. We propose an enhanced reinforcement learning-based two-way transmit-receive directional antennas neighbor discovery algorithm, called ERTTND. The algorithm consists of a two-way transmit-receive reinforcement learning mechanism (TTRL) and an enhanced reward-and-penalty mechanism (ERAP). This algorithm leverages insights from nodes in transmission and reception modes to refine their tactical decisions. Then, through an enriched reward-and-penalty framework, nodes optimize their strategies, thus expediting neighbor discovery based on directional antennas in wireless ad hoc networks. Simulation results demonstrate that compared to existing representative algorithms, the proposed ERTTND algorithm can achieve over 30% savings in terms of average discovery delay and energy consumption.