Misstatement Detection Lag and Prediction Evaluation

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

ABSTRACT Accounting misstatements are often detected with substantial delays, leading to “look-ahead bias” in model predictions if the detection lag is not considered. Moreover, the misstatement data-generating process is evolving due to regulatory regime shifts, further complicating the evaluation of model predictions. We design an approach that accounts for detection lags and continuously updates models to adapt to the changing data-generating process. By comparing with the conventional approach that ignores detection lags, we show that the look-ahead bias can substantially inflate prediction performance. We also demonstrate that although leaving a temporal gap between training and test samples can mitigate the look-ahead bias, it sacrifices the model’s predictive power by disconnecting the dynamic data-generating process between training and test periods. We further implement a trading strategy to evaluate the practical utility of the continuously updating approach. Our study presents a new conceptual lens for understanding and evaluating misstatement prediction models. Data Availability: Data are available from the public sources identified in the study. JEL Classifications: C53; G32; G38; M41.

Similar Papers
  • Research Article
  • Cite Count Icon 2
  • 10.1111/1475-679x.12454
Erratum
  • Aug 4, 2022
  • Journal of Accounting Research

Erratum

  • Research Article
  • Cite Count Icon 8
  • 10.1007/s10687-018-0327-7
Editorial: special issue on the extreme value analysis conference challenge “prediction of extremal precipitation”
  • Jun 14, 2018
  • Extremes
  • Olivier Wintenberger

At the Extreme Value Analysis conference in Delft in June 2017 a challenge for predicting spatio-temporal extremes was proposed. The aim of the challenge was to estimate high quantiles of daily rainfall and to extrapolate them in time and space. Eight teams competed in the challenge. A data set from the training period was given to each team. Based on the data from the training period each team predicted the corresponding high quantiles for the adjacent test period. The goal was to score those teams that achieved the best predictive power. Figure 1. Training and test samples from different periods of observation. 1. The data Daily (24 hour) accumulations of precipitation P j,t , j = 1,. .. , 40 (unit inches) have been recorded at 40 stations in the Netherlands during the 44 year period from t=12/31/1972 to t=12/31/2016. The training sample corresponds to the 24 year period from t=12/31/1972 to t=12/31/1995; see Figure 1. The aim was to predict, from the training sample, a quantile of level corresponding to the extreme monthly precipitation over the next 20 years (the test period from t=01/01/1996 to t=12/31/2016) station by station. On the daily level, this event corresponds to the 0.998-quantile, i.e., 0.998 = 1 − 0.002 ≈ 1 − 1 20 * 30 days. Financial support by the ANR network AMERISKA ANR 14 CE20 0006 01 is gratefully acknowledged by Olivier Wintenberger.

  • Research Article
  • Cite Count Icon 16
  • 10.1177/03091333221113660
Assessing the effectiveness of alternative landslide partitioning in machine learning methods for landslide prediction in the complex Himalayan terrain
  • Jul 11, 2022
  • Progress in Physical Geography: Earth and Environment
  • Muhammad Tayyib Riaz + 2 more

Several devastating landslides have occurred in the NW Himalayas, which has prompted several researchers to strive for improvement in landslide susceptibility modelling (LSM) methodologies. This research analyzes the effectiveness of alternative landslide partitioning techniques on LSM in the landslide-prone district, Muzaffarabad, Pakistan. We developed a landslide inventory of 961 landslides and then traditionally divided it into training (672; 70%) and testing (289; 30%) samples. These training samples (672) are processed by the Average Nearest Neighbour Index (ANNI) method to estimate the spatial pattern of landslides in nature. The results provide an ANNI ratio of 0.672 confirming that the landslides distribution pattern is cluster in the complex Himalayan terrain of Muzaffarabad. Among 672, the majority of landslides (529; 79%) depict cluster behaviour, while 189 landslides (21%) depict random behaviour. To evaluate the effectiveness of landslide cluster samples in prediction, five machine learning algorithms (MLAs), that is, K-Nearest Neighbour (KNN), Naïve Bayes (NB), Random Forest (RF), Extreme Gradient Boosting (XGBoost) and Logistic Regression (LR) using proposed cluster (529) and traditional (672) random training samples along with 17 geo-environmental factors are executed. However, testing samples (289; 30%) separated at the initial stage remained the same to check the model’s effectiveness. The areas under the curve (AUC-ROC), sensitivity, specificity, Kappa index and accuracy (ACC) have been used to evaluate the MLA’s performances. An alternative partitioning technique (cluster) shows the highest predictive power with AUC-ROC values ranging from 0.96 to 0.86, Kappa index ranges from 0.76 to 0.60 and ACC ranges from 0.90 to 0.83. Conversely, the random partitioning approach performs less well with AUC-ROC values ranging from 0.95 to 0.83, Kappa index ranges from 0.70 to 0.49 and ACC ranges from 0.87 to 0.80. In comparison, the RF cluster sampling-based model outperforms the other models and their counterparts. The RF model achieved the highest accuracy (0.902), highest AUC (0.962) and highest Kappa index (0.755) followed by XGboost having ACC (0.885), AUC (0.95) and Kappa index (0.724) employing proposed cluster training samples. However, traditional random training samples yield comparatively low ACC of RF (0.868) and XGboost (0.862). These results confirm that cluster training sampling performs well in obtaining reliable and precise LSMs for complex Himalayan terrain. Although random landslide partitioning for training datasets is seldom utilized in LSM, this study highlights that cluster partitioning for landslide training datasets might be a realistic and reliable approach.

  • Research Article
  • Cite Count Icon 8
  • 10.2139/ssrn.488185
Asset Return Predictability and Bayesian Model Averaging
  • Jan 20, 2004
  • SSRN Electronic Journal
  • Dragon Yongjun Tang

This paper studies model uncertainty associated with predictive regressions in asset return predictability research. We comprehensively investigate the performance of Bayesian model averaging (BMA), first introduced to the literature by Avramov (2002) and Cremers (2002), when applied to linear predictive regressions using simulation approaches. We find that, in simple settings, BMA performs fairly satisfactorily even when the true model is not in the model set. It can always identify the powerful predictors and constantly outperform other variable selection methods. The results are robust with respect to non-linearity and prior selections. We confirm that BMA attains best performance when model uncerainty is large, which indicates that it is easier to capture short-run predictability using BMA. However, when we add more structure to the data generating process (DGP), BMA performs less well both insample and out-of-sample. BMA mistakens noise variables for true predictors. This is especially the case when there is a lot of noise in the model set. For out-of-sample prediction, BMA overall model shows little advantage over a no-predictability model, and it tends to under predict. A possible cause could be the complex structure we imposed on the DGP.

  • Research Article
  • Cite Count Icon 4
  • 10.1080/2150704x.2019.1636153
Non-overlapping classification of hyperspectral imagery
  • Jul 7, 2019
  • Remote Sensing Letters
  • Jing Zhao + 2 more

ABSTRACTBecause of high spectral resolution, hyperspectral imagery (HSI) is well suited for classifying land cover types. Spectral information and spatial context are often combined to improve classification performance. For several publicly available hyperspectral datasets, the classification accuracies reach almost 100%. However, we think this high accuracy is mainly due to heavy overlap between training and test samples generated during training and test stages. Because the training samples and test samples are randomly sampled from the same images and spatially adjacent to each other, overly optimistic results are produced. However, in real-life applications, the training samples and test samples may be collected from different locations or at different times. In order to improve the classification performance when training samples and test samples have low correlation, a low correlation sampling method and two non-overlapping classification methods are introduced. Experimental results show that, although the reduced correlation between training and test samples reduces the classification performance dramatically, the spectral-spatial combined method is still an effective way to improve the classification accuracy of HSI.

  • Research Article
  • Cite Count Icon 38
  • 10.1016/j.frl.2004.10.002
Tay's as good as cay: Reply
  • Dec 15, 2004
  • Finance Research Letters
  • Martin Lettau + 1 more

tay's as good as cay: Reply

  • Research Article
  • Cite Count Icon 4
  • 10.1186/s40854-023-00497-z
Robust monitoring machine: a machine learning solution for out-of-sample R^2-hacking in return predictability monitoring
  • Jul 11, 2023
  • Financial Innovation
  • James Yae + 1 more

The out-of-sample R^2 is designed to measure forecasting performance without look-ahead bias. However, researchers can hack this performance metric even without multiple tests by constructing a prediction model using the intuition derived from empirical properties that appear only in the test sample. Using ensemble machine learning techniques, we create a virtual environment that prevents researchers from peeking into the intuition in advance when performing out-of-sample prediction simulations. We apply this approach to robust monitoring, exploiting a dynamic shrinkage effect by switching between a proposed forecast and a benchmark. Considering stock return forecasting as an example, we show that the resulting robust monitoring forecast improves the average performance of the proposed forecast by 15% (in terms of mean-squared-error) and reduces the variance of its relative performance by 46% while avoiding the out-of-sample R^2-hacking problem. Our approach, as a final touch, can further enhance the performance and stability of forecasts from any models and methods.

  • Research Article
  • 10.1080/13504851.2022.2159002
Unintended look-ahead bias in out-of-sample forecasting
  • Dec 24, 2022
  • Applied Economics Letters
  • James Yae

This article shows that out-of-sample tests are susceptible to look-ahead bias not only to multiple testing problem that is emphasized in the literature. A forecaster often constructs a well-performing model without trial and error but with an intuition that is derived from observed empirical patterns in the test sample. Such an intuition, however, is unavailable in the beginning of the test sample. Therefore, the reported forecasting performance in an out-of-sample test is possibly exaggerated, although a forecaster simply utilizes her expertise without any intended p-hacking or fishing. A stylized forecasting model with an example of stock market return predictability quantitatively demonstrates this unintended look-ahead bias in out-of-sample tests.

  • Research Article
  • Cite Count Icon 2
  • 10.3905/jii.2017.7.4.075
Big Data, Small Pickings: Predicting the Stock Market with Google Trends
  • Feb 28, 2017
  • The Journal of Index Investing
  • Wai Mun Fong

Big data such as Google Trends has stimulated much interest in the use of search query volumes for predicting social, business, and financial market trends. A recent paper by Preis, Moat, and Stanley [2013] claimed that a simple trading strategy using the Google Trends keyword debt powerfully predicts the Dow Jones Industrial Average stock index one week ahead and outperforms the buy-and-hold strategy by a factor of 20. Using the same sample period used by Preis, Moat, and Stanley, we show that debt completely loses its predictive power once look-ahead bias is eliminated. We find a similar result with a more recent sample period, from 2011 to 2016. Search terms that do outperform the buy-and-hold strategy generally have no economic meaning and are most likely spurious.

  • Research Article
  • Cite Count Icon 101
  • 10.1016/j.frl.2004.10.001
Tay's as good as cay
  • Nov 19, 2004
  • Finance Research Letters
  • Michael J Brennan + 1 more

tay's as good as cay

  • Book Chapter
  • 10.1017/cbo9780511762888.006
Formal Theory and Causality
  • Aug 6, 2010
  • Rebecca B Morton + 1 more

What Is a Formal Model? We turn in this chapter to the Formal Theory Approach (FTA) to causality. The key difference between FTA and the Rubin Causal Model (RCM) is that a formal model serves as the basis for the causal relationships studied. To understand what we mean by FTA, it is useful to define what we mean by a formal model. We define a formal model as a set of precise abstract assumptions or axioms about the data generating process (DGP) presented in symbolic terms that are solved to derive predictions about that process. These predictions are of two types: point predictions and relationship predictions. Point predictions are precise predictions about the values of the variables in the model when the model is in equilibrium, whereas relationship predictions are predictions about how we may expect two variables in the model to be related. Defining what is meant by whether the model is in equilibrium can vary with the model as well; different formal models rely on different equilibrium concepts, which is something that we investigate later in Section 6.5.4. Some of these relationship predictions may be predicted to be “causal” in that changes in one variable “cause” changes in the other variable. Definition 6.1 (FormalModel) : A set of precise abstract assumptions or axioms about the DGP presented in symbolic terms that are solved to derive predictions about the DGP . Definition 6.2 (Point Prediction of a Formal Model) : A precise prediction from a formal model about the values of the variables in the model when the model is in equilibrium .

  • Research Article
  • Cite Count Icon 24
  • 10.1111/jgs.15009
Validation of a Geriatric Trauma Prognosis Calculator: A P.A.L.Li.A.T.E. Consortium Study.
  • Aug 14, 2017
  • Journal of the American Geriatrics Society
  • Allyson C Cook + 14 more

The P.A.L.Li.A.T.E. (prognostic assessment of life and limitations after trauma in the elderly) consortium has previously created a prognosis calculator for mortality after geriatric injury based on age, injury severity, and transfusion requirement called the geriatric trauma outcome score (GTOS). Here, we sought to create and validate a prognosis calculator called the geriatric trauma outcome score ii (GTOS II) estimating probability of unfavorable discharge. Retrospective cohort. Four geographically diverse Level 1 trauma centers. Trauma admissions aged 65 to 102 years surviving to discharge from 2000 to 2013. None. Age, injury severity score (ISS), transfusion at 24 hours post-admission, discharge dichotomized as favorable (home/rehabilitation) or unfavorable (skilled nursing/long term acute care/hospice). Training and testing samples were created using the holdout method. A multiple logistic mixed model (GTOS II) was created to estimate the odds of unfavorable disposition then re-specified using the GTOS II as the sole predictor in a logistic mixed model using the testing sample. The final dataset was 16,114 subjects (unfavorable discharge status = 15.4%). Training (n = 8,057) and testing (n = 8,057) samples had similar demographics. The formula based on the training sample was (GTOS II = Age + [0.71 × ISS] + 8.79 [if transfused by 24 hours]). Misclassification rate and AUC were 15.63% and 0.67 for the training sample, respectively, and 15.85% and 0.67 for the testing sample. GTOS II estimates the probability of unfavorable discharge in injured elders with moderate accuracy. With the GTOS mortality calculator, it can help in goal setting conversations after geriatric injury.

  • Research Article
  • Cite Count Icon 28
  • 10.1016/j.neucom.2013.01.036
From the idea of “sparse representation” to a representation-based transformation method for feature extraction
  • Mar 1, 2013
  • Neurocomputing
  • Yong Xu + 4 more

From the idea of “sparse representation” to a representation-based transformation method for feature extraction

  • Research Article
  • Cite Count Icon 11
  • 10.1080/09500340.2017.1380854
Face recognition based on symmetrical virtual image and original training image
  • Oct 31, 2017
  • Journal of Modern Optics
  • Jingcheng Ke + 4 more

In face representation-based classification methods, we are able to obtain high recognition rate if a face has enough available training samples. However, in practical applications, we only have limited training samples to use. In order to obtain enough training samples, many methods simultaneously use the original training samples and corresponding virtual samples to strengthen the ability of representing the test sample. One is directly using the original training samples and corresponding mirror samples to recognize the test sample. However, when the test sample is nearly symmetrical while the original training samples are not, the integration of the original training and mirror samples might not well represent the test samples. To tackle the above-mentioned problem, in this paper, we propose a novel method to obtain a kind of virtual samples which are generated by averaging the original training samples and corresponding mirror samples. Then, the original training samples and the virtual samples are integrated to recognize the test sample. Experimental results on five face databases show that the proposed method is able to partly overcome the challenges of the various poses, facial expressions and illuminations of original face image.

  • Preprint Article
  • 10.5194/egusphere-egu24-6622
Comparison of SWAT and a deep learning model in nitrate load simulation at the Tuckahoe creek watershed in the United States
  • Nov 27, 2024
  • Jiye Lee + 10 more

Simulating nitrate fate and transport in freshwater is an essential part in water quality management. Numerical and data-driven models have been used for it. The numerical model SWAT simulates daily nitrate loads using simulated flow rate. Data-driven models are more flexible compared to SWAT as they can simulate nitrate load and flow rate independently. The objective of this work was evaluating the performance of SWAT and a deep learning model in terms of nutrient loads in cases when deep learning model is used in (a) simulating flow rate and nitrate concentration independently and (b) simulating both flow rate and nitrate concentration. The deep learning model was built using long-short-term-memory and three-dimensional convolutional networks. The input data, weather data and image data including leaf area index and land use, were acquired at the Tuckahoe Creek watershed in Maryland, United States. The SWAT model was calibrated with data over the training period (2014-2017) and validated with data over the testing period (2019) to simulate flow rate and nitrate load. The Nash-Sutcliffe efficiency was 0.31 and 0.40 for flow rate and -0.26 and -0.18 for the nitrate load over training and testing periods, respectively. Three data-driven modeling scenarios were generated for nitrate load. Scenario 1 included the flow rate observation and nitrate concentration simulation, scenario 2 included the flow rate simulation and nitrate concentration observation, and scenario 3 included the flow rate and nitrate concentration simulations. The deep learning model outperformed SWAT in all three scenarios with NSE from 0.49 to 0.58 over the training period and from 0.28 to 0.80 over the testing period. Scenario 1 showed the best results for nitrate load. The performance difference between SWAT and the deep learning model was most noticeable in fall and winter seasons. The deep learning modeling can be an efficient alternative to numerical watershed-scale models when the regular high frequency data collection is provided.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.