This paper contributes a tutorial level discussion of some interesting properties of the recent Cauchy–Schwarz (CS) divergence measure between probability density functions. This measure brings together elements from several different machine learning fields, namely information theory, graph theory and Mercer kernel and spectral theory. These connections are revealed when estimating the CS divergence non-parametrically using the Parzen window technique for density estimation. An important consequence of these connections is that they enhance our understanding of the different machine learning schemes relative to each other.