Nowadays, most agricultural robots rely on precise and expensive localisation, typically based on global navigation satellite systems (GNSS) and real-time kinematic (RTK) receivers. Unfortunately, the precision of GNSS localisation significantly decreases in environments where the signal paths between the receiver and the satellites are obstructed. This precision hampers deployments of these robots in, e.g., polytunnels or forests. An attractive alternative to GNSS is vision-based localisation and navigation. However, perceptual aliasing and landmark deficiency, typical for agricultural environments, cause traditional image processing techniques, such as feature matching, to fail. We propose an approach for an affordable pure vision-based navigation system which is not only robust to perceptual aliasing, but it actually exploits the repetitiveness of agricultural environments. Our system extends the classic concept of visual teach and repeat to visual teach and generalise (VTAG). Our teach and generalise method uses a deep learning-based image registration pipeline to register similar images through meaningful generalised representations obtained from different but similar areas. The proposed system uses only a low-cost uncalibrated monocular camera and the robot’s wheel odometry to produce heading corrections to traverse crop rows in polytunnels safely. We evaluate this method at our test farm and at a commercial farm on three different robotic platforms where an operator teaches only a single crop row. With all platforms, the method successfully navigates the majority of rows with most interventions required at the end of the rows, where the camera no longer has a view of any repeating landmarks such as poles, crop row tables or rows which have visually different features to that of the taught row. For one robot which was taught one row 25 m long our approach autonomously navigated the robot a total distance of over 3.5 km, reaching a teach-generalisation gain of 140.