Voodoo Investment Strategies
Mathematicians on the attack.
Firing a Broadside
Immediately after I finished yesterday's column on technical analysis, a related paper landed on my desk: "Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-Of-Sample Performance."
It's the first paper written by mathematicians that I have ever read. (Certainly, the first that was peer-reviewed for Notices of the American Mathematical Society.) And a pleasure it was. Based on a sample size of one, I can confidently state that mathematicians are much better writers than are finance professors.
They certainly know how to issue a call to arms:
Historically scientists have led the way in exposing those who utilize pseudoscience to extract a commercial benefit. As early as the 18th century, physicists exposed the nonsense of astrologers. Yet mathematicians in the 21st century have remained disappointingly silent with the regards to those in the investment community who, knowingly or not, misuse mathematical techniques such as probability theory, statistics and stochastic calculus. Our silence is consent, making us accomplices in these abuses.
Now, things aren't really that bad. The authors' critiques are familiar; even if mathematicians have been silent about the ills of investment back-testing, others have not. After all, the joke, "If you torture the data long enough, it will confess," was cracked by the economist Ronald Coase, not a mathematician. That said, the authors state their case well, and they offer a couple of solutions.
The paper begins:
Recent computational advances allow investment managers to methodically search through thousands or even millions of potential options for a profitable investment strategy. In many instances, the resulting strategy involves a pseudo-mathematical argument, which is spuriously validated through a simulation of its historical performance (also called a backtest).
A trivial example being the Super Bowl Indicator. According to the Indicator, a Super Bowl victory from a team from the old NFL equals a bull market for that calendar year, while a triumph from the AFL equals a bear market. That's a pseudo-mathematical argument, albeit of the simple kind, which we can validate by back-testing the Indicator's historical performance. Voila! The Indicator has worked on 35 of 47 occasions.
(I worked up the odds of this occurring randomly, which are low indeed, before realizing that while which league will win the Super Bowl is presumably a 50/50 event, stocks rise more often than they fall. That complicated the calculation--but I'm pretty sure that this pattern remains a less than 0.1% event.)
The Super Bowl Indicator is easy to dismiss, because there's no reasonable story for why a football league's results would affect equity prices. If the Indicator operated on something even slightly more compelling, however, its spurious and accidental relationship might end up informing an investment strategy.
The paper cites hedge funds seeking profits from seasonal effects.
The problem with this line of questioning is that there is always a time interval that is arbitrarily "optimal," regardless of the cause … While these findings may indeed be caused by some underlying seasonal effect, it is easy to demonstrate that any random data contains similar patterns. [This line of thought also relates to yesterday's "hot hands" discussion.] The discovery of a pattern in sample typically has no bearing out of sample, yet again as a result of overfitting.
Running such experiments without controlling for the probability of backtest overfitting will lead the researcher to spurious claims. Out-of-sample performance will disappoint, and the reason will not be that "the market has found out the seasonal effect and arbitraged away the strategy's profits." Rather, the effect was never there; instead it was just a random pattern that gave rise to an overfitted trading rule.
The authors then demonstrate how easy it is to find a "proven" investment strategy with in-sample data. They created a fictional market, generating 1,000 daily prices that followed a random walk. They then determined an optimal monthly trading rule for this market by testing various holding periods, entry dates (the time of the month for beginning the trade), long-short combinations, and the amount of loss that would trigger exiting the strategy. Sure enough, the authors found a tactic that per conventional tests of statistical significance had a 99%-plus chance of having a positive true Sharpe ratio.
In that instance, the authors thoroughly tested their four parameters to find an optimal mix. However, discovering bogus and apparently significant relationships need not require that much work. As the number of parameters increases, it becomes easier and easier to find an attractive back-tested strategy. According to the authors, the "expected maximum Sharpe ratio" for a "relatively simple strategy with just 7 binomial independent parameters" is 2.6. (That is fabulous; for comparison's sake, the Sharpe ratio for Vanguard 500 Index Fund (VFIAX) during the past five years, through the roaring bull market, is a mere 1.3.)
A Possible Fix
So, what to do? The authors have two suggestions.
First, investment professionals and the academic community both should publish the number of trials that were used in determining the strategy. After all, a result that by statistical measures would only occur randomly once in 1,000 draws isn't very special if the researcher took 1,000 draws! The authors go so far as to call that a "kind of fraud."
They offer the analogy of medical drug testing, which has been severely criticized on similar grounds. In that field, a project is currently under way to document all test results, whether positive or negative. The authors note that Johnson and Johnson has announced its intention to make all its test results publicly available. It would behoove the investment community to adopt a similar policy, they suggest.
Second, the paper advocates out-of-sample testing. The prescription for that isn't clear. The authors give an example of researchers stating that they had a preliminary finding, but not publishing the results until that preliminary finding was tested for the next six months out of sample after that initial statement. That does not seem to be a generalizable solution, as many investment tactics cannot be judged on six months' worth of data, or even six years' worth.
I would suggest instead a common policy of slicing the data sample in half. Test the first half, while recording the number of trials, then run the apparently winning strategies on the second half of the data. By that approach, an investment tactic discovered in relatively few trials, and which continued its success in the second half of the data set, would pass the hurdle and be acceptable for publication. Tactics that were not as rigorously determined could of course be used by portfolio managers, but they would not be regarded as meeting acceptable standards for either white papers or academic journals.
John Rekenthaler has been researching the fund industry since 1988. He is now a columnist for Morningstar.com and a member of Morningstar's investment research department. John is quick to point out that while Morningstar typically agrees with the views of the Rekenthaler Report, his views are his own.
John Rekenthaler does not own (actual or beneficial) shares in any of the securities mentioned above. Find out about Morningstar’s editorial policies.