On White’s (2000) Reality Check

I was asked the following question the other day on White’s reality check.


I was reading White’s (2000) paper (http://www.ssc.wisc.edu/~bhansen/718/White2000.pdf). It seems to suggest that to examine the pnl characteristic of a strategy, we should do bootstrapping on the P&L. I am wondering why this makes sense. Why scramble the P&L time series? Why not scramble the original returns series, e.g., returns of S&P 500, paste the segments together, and feed the “fake” returns series into the strategy again? Then you can generate as many P&L time series as you like. Then you can use them to compute the expected P&L, Sharpe-ratio, etc. Would the 2nd approach be more realistic than the approach suggested in the paper?


1) In the statistical literature, there are basically two approaches to tackling the data snooping bias problem. The first and more popular approach focuses on data. It tries to avoid re-using the same data set and can be done by testing a particular model on randomly chosen subsamples of the original time series or bootstrapping the original time series (e.g. return). However, one may argue that this sampling approach is somewhat arbitrary and may lack desired objectivity. See Chan, Karceski, and Lakonishok (1998) and Brock, Lakonishok, and Lebaron (1992) for examples of this sampling approach. Your proposed method basically falls into this category.

The second and more “formal” approach focuses on models by considering all relevant models (trading strategies) and constructing a test with properly controlled type I error (test size). Unfortunately, this method is not feasible when the number of strategies being tested is large. White’s paper basically follows the latter approach but he did in a smart way which does not suffer from the aforementioned problem.

2) However, there’s also a problem associated with White’s reality check. It may reduce rejection probabilities of the test under the null by the inclusion of poor and irrelevant alternative models (trading strategies) because it doesn’t satisfy a relevant similarity condition that is necessary for a test to be unbiased. I did some research and found the following paper by Hansen, which attempts to amend this problem by modifying White’s reality check.
Hansen, P. R. (2005), “A Test for Superior Predictive Ability”, Journal of Business & Economic Statistics, 23, pp. 365-380.

3) Basically, the emphasis of using these two approaches is different. Your proposed method tries to answer the following question: will this particular strategy be profitable on a different data set that exhibits similar characteristics to the original one? White’s (2000) reality check, on the other hand, tries to answer the following question: is at least one trading strategy (out of a pool of many seemingly profitable strategies) really profitable for this particular data set? I think a good and conservative approach would be: (a) answer White’s question first and if the answer is YES, then proceed to (b) answer your question by bootstrapping the return time series (and other covariates).

5 thoughts on “On White’s (2000) Reality Check”

  1. Dominik Ballreich

    I had exactly the same question. Thanks for the answer. I would like to ask two little questions regarding the first approach (bootstrapping the timeseries and feeding the fake returns series into the strategy again).

    If, for example, 10000 bootstrapped timeseries are fed into the strategy, is it “allowed” to use the t-test to find out if the true mean is different for 0 for a given alpha?

    Another example: I bootstrap 10000 new stock timeseries of stock returns (and can therefore create 10000 new timeseries of stockprices). Now I use a strategy (for example moving averages) on my new 10000 timeseries and get 10000 results (return p.a.) generated by the strategy. Also, with every timeseries the buy and hold strategy is performed. Now I have the pairs:
    [return p.a._buyandhold_stock_1, return p.a._movingaverage_stock_1]
    [return p.a._buyandhold_stock_1000, return p.a._movingaverage_stock_1000].

    Would it be allowed to use a paired t-test on the differences of the buy and hold and moving averages strategy returns to find out, if the mean is zero?

    I would be very glad, if you answered me.

    All the best,

    PS: Please excuse my bad English (-:

  2. Hi Dominik,

    Thank you for your message.

    1. I guess you meant to test the true mean of the PnL of a particular strategy. Yes, one sample t-test can usually be used in this case.

    However, t-test, like many other parametric test, is not distribution free. Since t-test is based on the assumption of normality, it should NOT be used if this assumption is seriously violated. We may test normality by using Shapiro-Wilk or Kolmogorov–Smirnov test, though it is controversial whether testing distribution before deciding whether to use a test statistic is a good practice; see Zimmerman (2004) for example. If the distribution of PnL is highly skewed, I would recommend to use transformed data instead.

    2. Yes, as mentioned in the preceding paragraph, paired t-test can be used here if the normality assumption isnot seriously violated. Alternatively, we may use non-parametric tests (aka distribution-free tests), which have the obvious advantage of not requiring the assumption of normality or the assumption of homogeneity of variance. The non-parametric analogue of paired t-test is the Wilcoxon t-test (http://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test).

    The following table gives the non-parametric analogue for the paired sample t-test and the independent samples t-test:
    Parametric Test (Non-Parametric Test Analogue)
    Paired sample t-test (Wilcoxon signed-rank test)
    Independent samples t-test (Mann-Whitney U test)
    Pearson’s correlation test (Spearman’s correlation test)
    I guess all of them can be found on Wikipedia 🙂


Leave a Comment

Your email address will not be published. Required fields are marked *