Views Article – Sharenet Wealth


Beware of Stock Market Pseudo-Statistics

What if I told you I could provide a 4-month leading indicator for the share price of Naspers? Furthermore, what if this leading “mystery variable” has a historical accuracy of 73% (i.e. R2 = 0.73117)? In the right hands, such an indicator would be the proverbial money tree. Even better, I can say that this relationship was inferred over a period of 701 trading days – a large sample size leading to a statistically significant result.

Simply see the graph below – in blue Naspers’ Share Price, in pink the Mystery Variable, and in grey the forecasted period. I have shifted the graph to account for the lead of this mystery variable. Is a downturn in Naspers’ price imminent?


If being correct 73% of the time is not good enough for you, fear not for I have found a leading indicator for FirstRand that has a coefficient of determination of 0.8!


The correlation is uncanny – in fact, it’s 80% accurate. Rather than keep you in suspense for much longer, I can reveal the mystery variables:

Naspers is regressed on the price (in $) of avocados in Boston (USA)
FirstRand is regressed on the price (in $) of avocados in the Great Lakes region (USA)

What does the price of avocados in the USA have to do with Naspers’ or FirstRand’s share price? Well, absolutely nothing. It’s a fluke and these correlations exist purely by chance – a coincidental series of dice rolls. Correlations that exist by chance are called “spurious”.

My apologies if you feel deceived, that was not my intention. I was employing an argument used by “pseudo-statisticians”. Through statistical terms such as “historically accurate”, “statistically significant”, “coefficient of determination”, and convincing graphs these “statisticians” seek to fool and mislead the reader. Typically, there is some ulterior motive such as seeking to build credibility as a DIY analyst of some kind or to promote a questionable product or service. My boss would not appreciate it if I put the company at risk of litigation by using one of these “analysts” as an example, instead here’s an obvious deception from a publication in an unrelated sector:


Do you notice the obvious flaw? UK sugar consumption per capita is used up to 1970 and thereafter US sugar consumption is used. Ironically, sugar consumption in the UK has declined since the 1970s (contrary to Jamie Oliver’s motives). Declining sugar usage doesn’t fit your narrative? Not to worry, simply use pseudo-statistics to cook the data.

I trust the sugar example illustrates that it’s not a leap to assume that there are players in the financial world who may intentionally push spurious statistics on unsuspecting individuals for the sake of self-interest. Beware of those posting “miracle” trading strategies or products – even if it is accompanied by rigorous back-testing on renowned software. Back-testing is meaningless if the data falls victim to data snooping or survivorship bias. An experiment is only as good as the data collected. Garbage in, garbage out.

Reverting back to correlations, have you heard which indicator has been found to be the best for the S&P 500 with an accuracy of 99%? Butter production in Bangladesh (Google it!). I assume you by now realise that this too is spurious. Of course, as any good stats lecturer will make his first-years repeat, “correlation does not imply causation”.

Naturally, if the model doesn’t make fundamental sense, it’s more than likely not worth more than the paper it’s written on and certainly not worth more than the phone the tweet appears on. For the record, I am an avid supporter of quantitative models and trading. There are numerous quant strategies that have been proven to work. Models based on mean reversion (such as Sharenet Analytics’ RTA) or momentum-based strategies have deep roots in sound statistical theory. However, if the strategy is some garage-band statistical experiment and unsupported by valid theory, it is one small step from mathematical witchcraft.

Scrubbing away the noise of erratic price movements to reveal the brass tacks at the heart of share price valuation is extremely tricky. Models, algorithms, and quantitative analysis are nothing more than vehicles to execute a good investment strategy and philosophy.

The logic extends to any trading strategy or analytics program you may come across. How can you, perhaps an individual not well versed in statistics, discern a valid model from pseudo-statistics? Simple. Ask the analyst who created the model why it works. If they can’t explain in a way that makes sense to you, then it’s a model doomed for failure.

On the contrary, good models are useful because they eliminate detrimental emotions like fear or greed from trading. Models developed by Sharenet Analytics are based on academic papers, sound mathematical principles, subject to back-testing, and tested out-of-sample inhouse with our own money. This way clients are assured to receive a polished and robust product. If you would like to receive early notification of our next model release, complete the form below.



Aidan van Niekerk
Junior Investment Analyst

Aidan is responsible for quantitative research and stock-market analytics. He has a keen interest in all things numbers and – in addition to working for Sharenet – is a full-time BSc Mathematics and Statistics student at Unisa. Aidan has represented South Africa at a professional level in road cycling where the rigorous demands and self-discipline has prepared him for the markets. He aspires to one day complete a Master’s degree in statistics and delve deeper into the world of algorithmic trading, market analytics, and financial modelling.

Like this article? Get more like it and exclusive market alerts before the general public by subscribing to free Sharenet newsletters using the form below.


We want to get to know you

* Required Fields