Abstract

We introduce a systematic method to estimate an economic indicator from the Japanese government by analyzing big Japanese blog data. Explanatory variables are monthly word frequencies. We adopt 1352 words in the section of economics and industry of the Nikkei thesaurus for each candidate word to illustrate the economic index. From this large volume of words, our method automatically selects the words which have strong correlation with the economic indicator and resolves some difficulties in statistics such as the spurious correlation and overfitting. As a result, our model reasonably illustrates the real economy index. The announcement of an economic index from government usually has a time lag, while our proposed method can be real time.

Highlights

  • By analyzing social big data such as blogs, twitter, etc., many statistical properties of human behaviors are uncovered scientifically

  • We examine the relation between a composite index (CI) and big blog data

  • Comparing I and Ī (Figure 9a), we found that sudden peaks are reduced in the case of Ī, and it fluctuates around the monthly economic index (CI)

Read more

Summary

Introduction

By analyzing social big data such as blogs, twitter, etc., many statistical properties of human behaviors are uncovered scientifically. We assume three variables, e.g., temperature, and the sales of ice cream and soft drinks Calculating correlations among these values, we probably find positive correlations. We consider when the temperature rises, people tend to buy more ice cream and soft drinks to provide relief from the heat; it is not reasonable to assume that the sales of soft drinks are increasing because the sales of ice cream are rising In this case, we find the positive correlation between sales of ice cream and soft drinks. In cases where we handle multivariate variables with non-normal distribution and non-stationary dynamics, it is important to carefully check their appropriateness for the Pearson correlation coefficient and regression analysis. We introduce a method which removes these difficulties and estimates the economic index using big blog data

Development of a New Economic Index Based on a Comprehensive Search
Data Pre-Processing
Extraction of Words by One-Body Correlation
Grouping
Visualization of Relationships Between Variables
Regression Analysis
Daily Index and Smoothing
Findings
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call