Our study proposes and tests a method for developing domain-specific dictionaries tailored for textual analysis in information systems research. Traditionally, dictionaries have been widely used for content classification according to sentiment; however, we introduce an alternative approach focused on creating dictionaries from sentiment-devoid documents. We demonstrate this method by developing a dictionary specific to Securities and Exchange Commission (SEC) investigations. Analyzing 150,432 publicly available SEC documents, we gained insights into the semantics of communications between the SEC and firms. To evaluate the dictionary, we analyzed SEC comment letters to predict the likelihood of firms reporting information technology control weaknesses (ITCWs), information technology audit fees, and cyber risks. Our dictionary outperformed five benchmarking dictionaries, explaining a higher proportion of variance in ITCW likelihood, information technology audit fees, and cyber risks. This study enhances the effectiveness of dictionaries in analyzing sentiment-devoid business and governance documents and results in a specialized dictionary for SEC communications.
Read full abstract