Abstract

For the functioning of American democracy, the Lobbying Disclosure Act (LDA), for the very first time, provides data to empirically research interest groups behaviors and their influence on congressional policymaking. One of the main research challenges is to automatically find the topic(s), by short a sparse text classification, in a large corpus of unorganized, semi-structured, and poorly connected lobbying filings to reveal the underlying purpose(s) of these lobbying activities. Common techniques for alleviating data sparseness are to enrich the context of data by external information. This paper, however, proposed an inter-disciplinary yet practical solution to this problem using a Multi-Topic Meta-Classification (MTMC) scheme built upon a set of semantic attributes (i.e., General Issue, Specific Issue, and Bill Info.), integrated with a domain-specific Policy Agenda (PA) coding/labeling procedure. First, multi-label base-classifiers that have been transformed into multi-class classification problems were learned from the abovementioned three semantic sources, respectively, second, to render reliability classification, one meta-classifier per attribute was trained based on meta-instances dataset labeled in a cross-validation fashion, third, the final prediction is made via fusing the reliable outputs of such ensembles of classifiers. Experiments demonstrated satisfactory classification performance with various evaluation measures on such a real-world textual dataset that poses many challenges including problems with noisy data and semantic ambiguity.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.