Abstract

Investigating the research trends within a scientific domain by analyzing semantic information extracted from scientific journals has been a topic of interest in the natural language processing (NLP) field. A research trend evaluation is generally based on the time evolution of the term occurrence or the term topic, but it neglects an important aspect—research publication latency. The average time lag between the research and its publication may vary from one month to more than one year, and it is a characteristic that may have significant impact when assessing research trends, mainly for rapidly evolving scientific areas. To cope with this problem, the present paper is the first work that explicitly considers research publication latency as a parameter in the trend evaluation process. Consequently, we provide a new trend detection methodology that mixes auto-ARIMA prediction with Mann–Kendall trend evaluations. The experimental results in an electronic design automation case study prove the viability of our approach.

Highlights

  • Considering the Research PublicationFor many scientific, industrial, and economic activities, collecting observations over time is a common procedure

  • In the natural language processing (NLP) field, trend analysis plays an important role, a relevant example in this respect being the evaluation of research trends using key term occurrences in scientific literature [1]

  • In order to evaluate the term trend from the term occurrence time series, we propose a two-step approach: (i) to the original time-series xi with i = 1, 2, . . . , k, we add N predicted values xk+1, xk+2, . . . , xk+ N using the auto-ARIMA method presented in Section 2.1; and (ii) apply the Yue and Wang [24] variant of MK test with the Sen’s slope estimation described in Section 2.2, to the concatenated time-series xi with i = 1, 2, . . . , k + N

Read more

Summary

Introduction

Considering the Research PublicationFor many scientific, industrial, and economic activities, collecting observations over time is a common procedure. In the natural language processing (NLP) field, trend analysis plays an important role, a relevant example in this respect being the evaluation of research trends using key term occurrences in scientific literature [1]. Due to their reduced sensitivities to outliers [2], the lack of assumptions concerning the data sample distribution [3] or homoscedasticity [4], nonparametric trend tests tend to be favored by researchers over parametric methods. The Mann–Kendall (MK) test statistic being a robust trend indicator when dealing with censored data, arbitrary non-Gaussian data distributions or time series with missing observations [5] have become almost standard methods for NLP applications [1,6,7,8,9,10,11,12]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call