When Data Deceives


“If you torture the data long enough, it will confess to anything” – British Economist Ronald H. Coase

In investing above maybe any other discipline, if you know where to look, you can find data on almost anything.  In general, this is a great thing; it allows us to be extremely informed on many fronts before making an investment decision.  However, any time there is an excess of data, the risk of misinterpreting or manipulating the data is high.  The skill of being able to determine what is meaningful versus what is noise can become very difficult, especially when there are profits to be had by convincing investors that any given piece of data is in fact meaningful.

So what do we do? Many fall victim to paralysis by analysis and fear doing anything.  Others may believe any narrative currently being pushed and bounce from idea to idea and while never sticking with a strategy long enough to let it play out.  A prudent investor will make a concerted effort to separate the wheat from the chaff and develop an evergreen investment philosophy without being swayed by the latest chart or infographic.  There are a few important rules to follow when determining if the data you’re looking at is worth applying:

  1. Is it statistically reliable?  This can be done using statistical methods , but generally one can get a good sense just be taking a step back and giving it the sniff test.  Do I have sufficient periods of good data? Is the result compelling?
  2. Does the result hold out of sample?  For example, If I run an analysis and determine that stocks outperform bonds in the US, then I should test to see if this idea holds true in other countries as well.  If the idea is that there is a premium for owning stocks, it shouldn’t matter which country (or sector, asset class, size etc.) you choose, stocks should consistently outperform bonds.
  3. Does the result hold in different time periods?  Same idea as above.  If I looked at the data initially from 1980-2000 and confirmed my hypothesis, then I should run the same study from 1960-1980 and 2000-2020 to see if the result holds during differing time periods. Many times just shifting the endpoints will result in a very different narrative (i.e. looking at 1979-1999 instead of 1980-2000).  This is often done intentionally, with the choice of endpoints being selected with the knowledge of what occurred.  The classic example of this is when a study is being used to shine a negative light on stocks, the time period from 2000-2009 is shown, which is conveniently book-ended by two of the worst stock market crashes in the last 50 years.
  4. Is there a story that make plausible sense?  In the previous example, it makes sense that stocks should provide a greater return than bonds given that they come with more inherent risk and volatility.  If however, I was making the claim that companies that begin with the letter “T” outperform all other companies, there’s pretty much no amount of data that should make me believe that to be true.
  5. Are there any significant outliers within the dataset?  If the original study of stocks v. bonds showed bonds outperforming stocks consistently in 19 of the 20 years, but there was 1 incredible year for stocks that swung the entire time period, this should raise a big red flag. Looking at cross sections of returns in addition to an entire time period can provide some valuable additional information.

The reason I write this is that in recent months I’ve seen many investors fall victim to this trap.  For my entire career, I have touted the benefits of diversifying into the lesser known parts of the stock market.  By divesting some of your portfolio out of the S&P 500, you can not only reduce risk by not being overly concentrated, but by investing into small companies and value companies you can also take advantage of the historical premiums they have provided and create a portfolio with a higher expected return.  Win win. This story worked great…until it didn’t.  On September 1st of 2020, one could have looked at the data and seen that over the last 40 years, large stocks had outperformed small stocks, AND growth stocks had outperformed value stocks.  Obviously this story was broken.  How could we go on claiming that investors could expect higher returns by investing in these asset classes, when they had underperformed for the last FORTY YEARS!

This concern is completely rational, but does not tell the complete picture.  While forty years is a long time period, and may pass the sniff test for seeming to be reliable, it is important not to ignore the other reliability tests mentioned above.

The first thing to test is to move the endpoints.  It is very difficult to consider the present just another data point, but a fair and objective analysis demands that we do just that.  Below is a graph of every 40 year rolling time period since 1926 comparing the S&P 500 to Large Value stocks:

The orange represents the annualized rate of return over each 40-year period, the blue shows the return of the S&P 500.  A premium existed for every single rolling period except for the tiny period that occurred last fall where the blue inched ahead.  Looking at this graph above without getting stuck in the moment, does it make sense to invest your entire portfolio in the blue going forward?  Or did we just experience a small blip in an otherwise extremely compelling dataset.  The picture below shows the exact same data, only substituting small stocks for value stocks. Similarly, small stocks showed an extremely consistent outperformance over 40-year rolling periods until the small period of large stock outperformance that occurred in the fall of 2020.

So what has happened?  We’ve had a an outlier event in which large growth stocks have had a short run of extreme outperformance and skewed longer time periods.  Exactly the type of data deception I warned against in the onset.  In real time it is extraordinarily difficult not to get sucked in to the narrative that “we are in a new paradigm” or “this time is different”.  This is definitely not the first time and certainly won’t be the last where the present feels different and stories are being painted as to why we should ignore the principles above. However, those who can distinguish good data from a popular narrative can avoid some of the common traps that so many people have fallen into in the past and will inevitably continue to fall into in the future.

Leave a Comment

Ready to Take The Next Step?

For more information about any of the products and services listed here, schedule a meeting today or register to attend a seminar.

Or give us a call at