Abstract
In this poster we describe a pilot study of searching social science literature for legacy corpora to evaluate text mining algorithms. The new emerging field of computational social science demands large amount of social science data to train and evaluate computational models. We argue that the legacy corpora that were annotated by social science researchers through traditional Qualitative Data Analysis (QDA) are ideal data sets to evaluate text mining methods, such as text categorization and clustering. As a pilot study, we searched articles that involve content analysis and discourse analysis in leading communication journals, and then contacted the authors regarding the availability of the annotated texts. Regretfully, nearly all of the corpora that we found were not adequately maintained, and many were no longer available, even though they were less than ten years old. This situation calls for more effort to better maintain and use legacy social science data for future computational social science research purpose.
Highlights
Bei Yu Syracuse UniversityFollow this and additional works at: https://surface.syr.edu/istpub Part of the Library and Information Science Commons, and the Linguistics Commons
The new emerging field of Computational Social Science aims to use computational models to analyze large amount of data to “reveal patterns of individual and group behaviors” (Lazer, et al, 2009)
A subarea in computational social science is to use machine learning and natural language processing techniques to automatically analyze large amount of text, especially user-generated content on the Web, in order to understand the topics, perspectives, mood, personalities, and many other aspects that humans manifest in language
Summary
Follow this and additional works at: https://surface.syr.edu/istpub Part of the Library and Information Science Commons, and the Linguistics Commons. Recommended Citation Yu, B. and Ku, M. ASIST 2010 Annual Meeting, Pittsburgh, PA, October 22-27, 2010. Collecting legacy corpora from social science research for text mining evaluation
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have