Abstract
This paper analyzes the application of natural language processing (NLP) on the 10-K and the 10-Q company reports. Using the Brain Language Metrics on Company Filings (BLMCF) dataset, which monitors numerous language metrics on 10-Ks and 10-Qs company reports, we analyze various lexical metrics such as lexical richness, lexical density, and specific density. In simple words, lexical richness says how many unique words are used by the author. The idea is that the more varied vocabulary the author has, the more complex the text is. Secondly, lexical density measures the structure and complexity of human communication in a text. A high lexical density indicates a large amount of information-carrying words. And lastly, specific density measures how dense the report's language is from a financial point of view. In other words, how many finance- related words are used in the text. Overall, we can say that this type of alternative data exhibits interesting results. Even though lexical richness produced the weakest results (of our strategies) when applied to the investment universe consisting of 500 stocks, it significantly improved when we expanded the investment universe to 3000 stocks. Moreover, the strategies based on the lexical density and specific density improved the Sharpe ratio even further. In the Last section, we combine the two metrics (Lexical density and Specific density) in one strategy. Applying both of these metrics to the investment universe with 500 stocks produces a Sharpe ratio of 0.688.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.