One of the key features of most parts of the economy is that outliers matter, a lot.
This is often hard for us to get our heads around. We are used to thinking about averages, but a lot of important things are distributed in a manner where thinking about averages can be disastrous.
I thought of this again when I happened to look at my citations on google scholar last week. So far, I have 32 publications listed there, with a total of 92 citations. That means that the average number of citations that my papers receive is 2.875 – which, as a relatively new academic, I’m pretty happy about.
We usually expect this sort of thing to follow a normal distribution. For my average citation count of 2.875, a normal distribution would look like this (I made this with an amazingly useful tool put together by Jon Wittwer):
This graph has a few implications. The first is that if you listed the papers by the number of citations, the most cited paper would be cited 9 or 10 times at most. Second, if you took the middle paper by rank (the median), it would be cited something close to 2.875 times (the average). In a normal distribution, the values for the median and the average are pretty close. Finally, the majority of the papers (about 70%) would be cited between 1 and 5 times.
With those ideas in mind, now take a look at my actual citations:
Out of the 32 papers, only 9 have been cited. The citation counts for those are 48, 11, 11, 8, 8, 4, 4, 3, and 1. In normal distribution, the odds of getting any papers cited more than 10 times are essentially 0, yet I have three that have. And the one that’s been cited 48 times is so far off the scale in a normal distribution that it would be viewed as utterly and completely beyond imagining.
But numbers like that are pretty normal for citation counts.
The other contrasts with a normal distribution are also interesting. The median citation count for me is 0, which doesn’t line up very well with the average. And only 4 of the papers (12.5%) of the papers have been cited between 1 and 5 times, not 22 as you would expect with a normal distribution.
The distribution of citations is clearly abnormal. They tend to follow a power law distribution – which is one of many distributions that has a very fat tail. John Hagel explains the difference between normal (Gaussian) and fat-detailed (Paretian) distributions very well in this outstanding post. He says:
But, as with most things in business (and in life), mindsets become a key stumbling block. McKelvey and Boisot describe the “Gaussian perspective of the world” as one built on atomism, privileging “stability over instability, structure over process, objects over fields, and being over becoming.” Not a bad summary of the way most Western executives view the business landscape. There is a natural and very human tendency to seek out the typical or the average and to search for more predictability. By implication, a Paretian world requires a much more dynamic view of the world, one that looks for patterns in evolving relationships, rooted deeply in context, and that understands how these changing patterns reshape who we are as well as our opportunities for growth. McKelvey’s provocative work will help to challenge and shift our mindsets.
What types of things have Paretian distributions? Book sales are a good example. The average number of copies sold for a new book is around 100. The median is closer to zero. But Harry Potter and the Deathly Hallows has sold 44 million copies.
Movie revenues follow a Paretian distribution too. So does the average daily change in stock prices. That’s an important one – imagine a financial model built on the assumption that the average stock price changes follow a normal distribution, but where in reality they follow a Paretian distribution. This means that the extremes are much more extreme than your model assumes.
And that pretty much explains how Long Term Capital Management almost managed to blow up the financial system in 1997. And how Lehmann & company almost managed to do it again in 2008.
Another thing that follows a Paretian distribution is returns to innovation. Most innovation is incremental, and the returns are small. But every once in a while, some new idea goes viral.
This causes problems. If you see all your innovation efforts resulting in small returns, and you are thinking in terms of Gaussian distributions, you will underinvest in innovation.
How can we combat this? There are a few steps we can take. Understanding that the economy is Paretian more than Gaussian is an important first step.
Second, since it is very hard to pick in advance which ideas will go viral, we need to place many small bets. I keep writing papers and blog posts, because I know that the more I do, the better my chances are of finding an idea that will hit big.
Third, we can study and try to learn from the new ideas that are successful. This is similar to the positive deviance approach.
If you are operating in a Paretian world but you assume that it’s Guassian, you’re heading for trouble. That’s why thinking about averages can be disastrous. Think about outliers instead.
Please note: I reserve the right to delete comments that are offensive or off-topic.