Perspective is a publicly available, machine learning API that can score text for toxicity. It is available for use in online platforms and communities to limit toxicity and promote civil dialogue. In this work, we adopt a human-centered approach to evaluating Perspective by investigating if human ratings of toxicity align with Perspective’s toxicity scores. We also test its transferability by making this comparison for comments from three platforms that have different commenting styles and moderation strategies: news websites, YouTube, and Twitter. Apart from toxicity, the main attribute, we collect participant ratings for three additional attributes: respectfulness, formality, and presence of stereotypes. While disrespect is part of how Perspective defines toxicity, formality and presence of stereotypes were included in the study to explore if they could be hidden/latent attributes that affect toxicity scores from Perspective. We analyzed how participant ratings for these additional attributes vary with respect to Perspective’s toxicity score for comments from each platform. We find that for high toxicity scores, Perspective strongly aligns with participant ratings of toxicity and disrespectfulness across all three platforms, providing weak evidence of its transferability. However, our evaluation also surfaced formality and presence of stereotypes as latent attributes that are unrecognized parts of Perspective’s scores. We discuss how and why this evaluation is “human-centered,” the importance of conducting such evaluations, and implications of these results for content moderation in social platforms.
Read full abstract