Abstract Background and objective Accurate segmentation of retinal vessels from color fundus images play a significant role in early diagnosis of various ocular, systemic and neuro-degenerative diseases. Segmenting retinal vessels is challenging due to varying nature of vessel caliber, the proximal presence of pathological lesions, strong central vessel reflex and relatively low contrast images. Most existing methods mainly rely on carefully designed hand-crafted features to model the local geometrical appearance of vasculature structures, which often lacks the discriminative capability in segmenting vessels from a noisy and cluttered background. Methods We propose a novel visual attention guided unsupervised feature learning (VA-UFL) approach to automatically learn the most discriminative features for segmenting vessels in retinal images. Our VA-UFL approach captures both the knowledge of visual attention mechanism and multi-scale contextual information to selectively visualize the most relevant part of the structure in a given local patch. This allows us to encode a rich hierarchical information into unsupervised filtering learning to generate a set of most discriminative features that aid in the accurate segmentation of vessels, even in the presence of cluttered background. Results Our proposed method is validated on the five publicly available retinal datasets: DRIVE, STARE, CHASE_DB1, IOSTAR and RC-SLO. The experimental results show that the proposed approach significantly outperformed the state-of-the-art methods in terms of sensitivity, accuracy and area under the receiver operating characteristic curve across all five datasets. Specifically, the method achieved an average sensitivity greater than 0.82, which is 7% higher compared to all existing approaches validated on DRIVE, CHASE_DB1, IOSTAR and RC-SLO datasets, and outperformed even second-human observer. The method is shown to be robust to segmentation of thin vessels, strong central vessel reflex, complex crossover structures and fares well on abnormal cases. Conclusions The discriminative features learned via visual attention mechanism is superior to hand-crafted features, and it is easily adaptable to various kind of datasets where generous training images are often scarce. Hence, our approach can be easily integrated into large-scale retinal screening programs where the expensive labelled annotation is often unavailable.