The academic and research policy communities have seen a long debate concerning the merits of peer review and quantitative citation-based metrics in evaluation of research. Some have called for replacing peer review with use of metrics for some evaluation purposes, while others have called for the use peer review informed by metrics. Whatever one's position, a key question is the extent to which peer review and quantitative metrics agree. In this paper we study the relation between the three journal metrics source normalized impact per paper (SNIP), raw impact per paper (RIP) and Journal Impact Factor (JIF) and human expert judgement. Using the journal rating system produced by the Excellence in Research for Australia (ERA) exercise, we examine the relationship over a set of more than 10,000 journals categorized into 27 subject areas. We analyze the relationship from the dimensions of correlation, distribution of the metrics over the rating tiers, and ROC analysis. Our results show that SNIP consistently has stronger agreement with the ERA rating, followed by RIP and then JIF along every dimension measured. The fact that SNIP has a stronger agreement than RIP demonstrates clearly that the increase in agreement is due to SNIP's database citation potential normalization factor. Our results suggest that SNIP may be a better choice than RIP or JIF in evaluation of journal quality in situations where agreement with expert judgment is an important consideration.