# Correlation is not causation

Paul Kaplan: Quoting Benjamin Disraeli, Mark Twain famously quipped, "There are three kinds of lies: lies, damned lies, and statistics." In the field of investments, in which we rely heavily on statistical analysis to evaluate the merits of numerous investment strategies and products, Twain's point is all too relevant.

One statistic that is all too easy to be misleading is correlation, starting with its definition. How many times have we heard that correlation measures the tendency for two variables to move up and down together? That's not quite right. What correlation actually measures is the degree to which two variables, each in excess of its own average, are statistically related.

The other major mistake often made with respect to correlation is causation. Seeing that two variables are statistically related, it is all too easy to jump to the conclusion that there is a causal relationship between them. But correlation and causation are two very different things.

According to an article published in Forbes a few decades ago, academics David Leinweber and David Krider drove this point home by showing that there was a very high correlation between the annual level of the S&P 500 and the annual production of butter in Bangladesh. I found the paper by Leinweber in which he presents this result for the period 1981 through 1993. The correlation over this period was about 87%.

I wanted to see if I could create a chart like Leinweber's for the S&P/TSX Composite over a recent period. It didn't take me long to discover that for the Canadian stock market, it's the butter production of Brazil over the period 1994 through 2017 that does the trick. In this chart, I have drawn the level of the S&P/TSX Composite for each year as a blue square and a red line that shows the level of the S&P/TSX Composite predicted by the annual butter production in Brazil. They appear to be strongly related. As in Leinweber's example, the correlation is about 87%.

If you are thinking that there must be some trick to finding dairy production numbers that are correlated with stock market indices, you'd be right. The trick is to use trended variables. Over any period of time, if two variables are trending upward, such as a stock market index and production in a growing dairy industry, they are positively correlated, even if there is no causal link between them.

The solution to trended variables is to remove the trends in both of them. With both stock market indices and production levels, the natural way to detrend them is to take the percentage rate of change of each variable. I did that for the annual levels S&P/TSX Composite and annual Brazilian butter product. This chart plots the annual percentage rates of change of both of these variables. Now we get the expected result of almost no correlation (just an insignificant 5%).

But even if we have constructed the variables properly, correlation is still not causation. If A and B are correlated, it could be that there is a third variable, C, that we cannot observe.

When trying to find causation, one has to look to economic reasoning, not just statistical links. This is especially important to keep in mind when evaluating quantitative investment strategies, especially those that are now implemented through the many new strategic beta ETFs. Any causal explanation has to be made apart from the statistics. Only then can we avoid making statistics the third kind of lie.