More and more investors are tracking and analyzing news flows and text sentiments in large, unstructured datasets and mining data in a split second for investment-relevant information. Often, the goal is to incorporate findings into trading strategies and algorithms, to back-test investment theses, or to improve risk-monitoring activities. Finding information of interest in large, textual databases is becoming easier as semantic analytic technologies mature. Tools already abound to help companies monitor the sentiment toward their brands by evaluating “big data” repositories, such as customer comments. Text and news analytics for investing is a newer niche application, and many are now experimenting with investment use cases. As they do, the need to quantify and incorporate unstructured data into investing equations continues to grow, along with the volume of textual data: About 2.5 billion gigabytes of data were generated every day in 2012, according to IBM, and by some estimates, 75% of data are unstructured. Data sources that drive tools for investors are diverse and depend on the vendor of the analytics. Some vendors allow users to hand-pick data and custom aggregate it. A few examples of data sources for text analytics are news feeds, select blogs, social media data, RSS (rich site summary) feeds, customer relationship management (CRM) data, and audio and TV news transcripts, such as CNBC’s Squawk Box transcripts. News analytics have been used for years in equity investing, but now, investors are applying these tools in global macro strategies and foreign exchange (FX) trading as well as in commodities and fixed-income investing. Early adopters include quantitative hedge funds in the United States and United Kingdom. Machine learning capabilities are getting better at interpreting and almost understanding what news may mean, although many pitfalls still exist. Most news analytics tools use sentiment analysis algorithms that work with a simple scoring mechanism. Using natural language processing, the algorithm counts the number of positive words and the number of negative words in a text and then assesses which company name appears most in the text. The algorithm assigns a positive or negative score to the text, often including a relevance or probability score for the different entities. It then aggregates this data over the whole news flow and gives users an indication of whether certain companies or terms are trending positive or negative. Some tools score news sentiment and present it as a time series, measure the volume of media coverage, or capture deviations in media coverage that can be telling for investors. Other tools define and track entity relationships in massive textual datasets. The focus of news analytics tools is no longer on speed or reducing latency, according to Armando Gonzalez, president and CEO of RavenPack, a provider of news analytics for quantitative finance applications. “It’s about developing sustainable alpha or verifiable risk management strategies from unstructured data,” he says. “Can users reliably extract a durable signal from the vast amount of news and social content out there that lives for hours, days, weeks, and ideally months?”
Read full abstract