Abstract

PurposeTo overcome the shortcomings of traditional industry classification systems such as the Standard Industrial Classification Standard Industrial Classification, North American Industry Classification System North American Industry Classification System, and Global Industry Classification Standard Global Industry Classification Standard, the authors explore industry classifications using machine learning methods as an application of interpretable artificial intelligence (AI).Design/methodology/approachThe authors propose a text-based industry classification combined with a machine learning technique by extracting distinguishable features from business descriptions in financial reports. The proposed method can reduce the dimensions of word vectors to avoid the curse of dimensionality when measuring the similarities of firms.FindingsUsing the proposed method, the sample firms form clusters of distinctive industries, thus overcoming the limitations of existing classifications. The method also clarifies industry boundaries based on lower-dimensional information. The graphical closeness between industries can reflect the industry-level relationship as well as the closeness between individual firms.Originality/valueThe authors’ work contributes to the industry classification literature by empirically investigating the effectiveness of machine learning methods. The text mining method resolves issues concerning the timeliness of traditional industry classifications by capturing new information in annual reports. In addition, the authors’ approach can solve the computing concerns of high dimensionality.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.