In music information retrieval (MIR) an important research topic, which has attracted much attention recently, is the utilization of user-assigned tags, artist-related style, and mood labels, which can be extracted from music listening web sites, such as Last.fm (http://www.last.fm/) and All Music Guide (http://www.allmusic.com/). A fundamental research problem in the area is how to understand the relationships among artists/songs and these different pieces of information. Co-clustering is the problem of simultaneously clustering two types of data (e.g., documents and words, and webpages and urls). We can naturally bring this idea to the situation at hand and consider clustering artists and tags together, artists and styles together, or artists and mood labels together. Once such co-clustering has been successfully completed, one can identify co-existing clusters of artists and tags, styles, or mood labels (T/S/M). For simplicity, we use the acronym T/S/M to refer to tag(s), style(s), or mood(s) for the rest of the paper. When dealing with tags it is worth noticing that some tags are more specific versions of others. This naturally suggests that the tags could be organized in hierarchical clusters. Such hierarchical organizations exist for styles and mood labels, so we will consider hierarchical co-clustering of artists and T/S/M. In this paper, we systematically study the application of hierarchical co-clustering (HCC) methods for organizing the music data. There are two standard strategies for hierarchical clustering. One is the divisive strategy, in which we attempt to divide the input data set into smaller groups recursively, and the other is the agglomerative strategy, in which we attempt to combine initially individually separated data points into larger groups by finding the most closely related pair at each iteration. We will compare these two strategies against each other. We apply a previously known divisive hierarchical co-clustering method and a novel agglomerative hierarchical co-clustering. In addition, we demonstrate that these two methods have the capability of incorporating instance-level constraints to achieve better performance. We perform experiments to show that these two hierarchical co-clustering methods can be effectively deployed for organizing the music data and they present reasonable clustering performance comparing with the other clustering methods. A case study is also conducted to show that HCC provides us a new method to quantify the artist similarity.