How to Win in Venture Capital: Focus on the Fat Tails

16 min readAug 23, 2018

The greatest shortcoming of the human race is our inability to understand the exponential function.
— Albert Allen Bartlett
The biggest secret in venture capital is that the best investment in a successful fund equals or outperforms the entire rest of the fund combined.
— Peter Thiel

In What is Code?, Paul Ford points out how the profession of programming is highly adaptable to change. Programming languages written to solve one set of problems are often — inevitably — adopted to solve a problem in another. For example, with the V8 engine, JavaScript became spanking-fast and could run outside the web browsers for which it was initially designed.

One day, JavaScript ran inside Web pages. Then it broke out of its browser prison. Now it could operate anywhere. It could touch your hard drive, send e-mail, erase all your files. It was a real programming language now. And the client … had become the server.

In the river “flow” of technological progress, one often observes trends toward:

abstraction (of information and processes)
adaptation (of one solution to different use case)
acceleration (solutions enable other solutions)
universality (good patterns can be repeated anywhere)

These trends have resulted in highly skewed power law distributions (vs. the bell curve), where a small handful of people are “hyper high performers,” a broad swath of people are “good performers”, and a smaller number of people are “low performers.”

Power law distributions

Power law distribution vs. normal distribution

A major feature of power law distributions is that small outcomes are very likely while larger ones are less likely. In other words, a small number of inputs account for a large percentage of outputs. Power laws also exhibit “fat tails,” compared to the area under a normal distribution curve which falls off much faster as you move farther along the x-axis.

Many prominent entrepreneurs and venture capitalists assert that VC returns are distributed according to a power law. As Marc Andreessen of Andreessen Horowitz points out, each year, of the 4,000 technology startups seeking VC funding, only 200 (or 5%) are seriously fundable, with “15 of those generating 95% of all economic returns…even the top VCs write off half their deals.” Billionaire tech investor Peter Thiel concurs, “[W]e don’t live in a normal world; we live under a power law. … [I]n venture capital, where investors try to profit from exponential growth in early-stage companies, a few companies attain exponentially greater value than all others. … Bad VCs tend to think the dashed line is flat, i.e. that all companies are created equal, and some just fail, spin wheels, or grow. In reality you get a power law distribution.”

In Zero to One, Thiel further elaborates:

In 1906, economist Vilfredo Pareto discovered what became the “Pareto Principle,” or the 80–20 rule, when he noticed that 20% of the people owned 80% of the land in Italy — a phenomenon that he found just as natural as the fact that 20% of the peapods in his garden produced 80% of the peas. This extraordinarily stark pattern, when a small few radically outstrip all rivals, surrounds us everywhere in the natural and social world. The most destructive earthquakes are many times more powerful than all smaller earthquakes combined. The biggest cities dwarf all mere towns put together. And monopoly businesses capture more value than millions of undifferentiated competitors. Whatever Einstein did or didn’t say, the power law — so named because exponential equations describe severely unequal distributions — is the law of the universe. It defines our surroundings so completely that we usually don’t even see it.
…
What does the distribution of returns in venture fund look like? The naïve response is just to rank companies from best to worst according to their return in multiple of dollars invested. People tend to group investments into three buckets. The bad companies go to zero. The mediocre ones do maybe 1x, so you don’t lose much or gain much. And then the great companies do maybe 3–10x.
But that model misses the key insight that actual returns are incredibly skewed. The more a VC understands this skew pattern, the better the VC. Bad VCs tend to think the dashed line is flat, i.e. that all companies are created equal, and some just fail, spin wheels, or grow. In reality you get a power law distribution.
…
Indeed, the single most powerful pattern I have noticed is that successful people find value in unexpected places, and they do this by thinking about business from first principles instead of formulas.

VCs often write off as many as half of their investments. Historical data confirm that most investments will return very little or even lose money, many still return some multiple on initial investment, but it is the very small handful of “hyper performers” that return an outcome well outside what could be expected in a normal distribution which not only more than compensate for losses but also generate most of a portfolio’s returns. The last point exhibits the idea of “fat tails”: As the tails have more bulk, the probability of extreme events is higher compared to the normal. Portfolios are constructed around the idea that these unlikely, but high-value, outcomes will drive the returns of the portfolio, in spite of the fact that small-to-negative returns make up the highest probability outcome of companies within the portfolio.

Power laws have a property that normal distributions do not: fat tails. The further out the x-axis, the faster normal curves drop off.

The most experienced and successful venture capitalists grok the concept of the power law and how it impacts the outcomes of startup investments. In fact, the power law is so common that Peter Thiel considers it a crucial concept for all business people to understand.

The quest for unicorns

The power law implies two rules for VCs, says Peter Thiel.

This implies two very strange rules for VCs. First, only invest in companies that have the potential to return the value of the entire fund. … This leads to rule number two: because rule number one is so restrictive, there can’t be any other rules.
…[L]ife is not a portfolio: not for a startup founder, and not for any individual. An entrepreneur cannot “diversify” herself; you cannot run dozens of companies at the same time and then hope that one of them works out well. Less obvious but just as important, an individual cannot diversify his own life by keeping dozens of equally possible careers in ready reserve.

In his class at Stanford, Thiel hammers home the importance of understanding power laws:

Consider a prototypical successful venture fund. A number of investments go to zero over a period of time. Those tend to happen earlier rather than later. The investments that succeed do so on some sort of exponential curve. Sum it over the life of a portfolio and you get a J curve. Early investments fail. You have to pay management fees. But then the exponential growth takes place, at least in theory. Since you start out underwater, the big question is when you make it above the water line. A lot of funds never get there.

Thiel explains how investors can apply the mental model of power laws (more from Masters’ notes on Class 7):

A better model is to invest in maybe 7 or 8 promising companies from which you think you can get a 10x return. It’s true that in theory, the math works out the same if try investing in 100 different companies that you think will bring 100x returns. But in practice that starts looking less like investing and more like buying lottery tickets.

The key to all mental models is knowing the facts and being able to use the concept. As George E. P. Box said, “All models are wrong, but some are useful.” We humans have a hard time grasping exponential curves and the implications of the power law. For example, if you apply the 80–20 rule to the top 20% of venture capital deals, then 4% of the deals produce 64% of all returns. Apply the 80–20 rule again to the top 4% of deals and you get 0.8% of deals producing 51.2% of all returns. The implication is that 1 in 20 deals may produce 2/3 of all returns and 1 in 100 deals may return more than all other deals combined.

Hence the VC world’s quest for those rare, home-run deals has led to the rise of “unicorns,” a term coined by Aileen Lee to describe startups whose valuations have risen to $1 billion or more.

Quartz’s June 2015 study on the biggest “unicorn makers” in the VC world listed the top 14 VCs who hunted down 93 unicorns which collectively represented just 2.6% of their deals. Aileen estimated that only 0.07% or 1 in 1,538 tech startups reach unicorn status. Comparing the rates, a startup funded by one of the top VCs is 40 times more likely to be a unicorn than a random startup.

Even so, the cold, hard reality is that unicorns are rare even among top portfolios. Sequoia Capital, which had the highest number of unicorns (total of 17) in the study, saw only 1 in 20 companies reach unicorn status. Peter Thiel’s Founders Fund had five unicorns, which represented less than 1 in 30 companies.

Paradox for VCs

Despite the deeply ingrained power-law thinking in the VC industry, one can’t help asking, are venture capital returns really power-law distributed? Power law distributions can be hard to distinguish from the tail of log normal distributions or from a distribution of several exponential distributions. People fit the data to all of these. Some have tried to debunk the myth of power laws.

Peter Thiel criticizes the VC industry by making the observation that most VCs spend 80% of their time on “the losers.” His advice is to spend much more time on the small handful of big winners. Just take a look at the boards of successful VC-backed companies. Their boards tend to get larger and larger as each round is bid up and led by new investors. Furthermore, not only do all board members attend every meeting, all official and sometimes non-official board “observers” start to show up (some VC firms may bring more than one person to meetings, even if they are not on the board). Do VCs pick winners or build them?

Conversely, at companies that are not successful, board attendance may shrink. Some VCs may stop showing up — by dialing in or finding a replacement (usually a junior partner or an industry executive). If things start to look really lousy, some VCs may simply resign from the board and walk away. As the saying goes, “success has many fathers and failure is an orphan.”

In contrast, Fred Wilson of Union Square Ventures (USV) takes a different approach — he spends most of his time with the losers, the “long tail of investments that don’t move the needle for the VC fund.” Why would he do that? Because the “long tail consists of entrepreneurs and their teams. People who have given years of their lives to a dream that was ultimately not realized.” Some VCs not only understand but also empathize with the human costs of startups.

Some may consider this irrational behavior from a fund’s economics perspective. However, in light of a VC’s reputation, it makes sense. VCs touch the lives of many entrepreneurs, most of whom will not achieve great success. Therefore, it’s incumbent upon VCs to treat the “long tail” with respect, not as losers. Ironically, USV’s unicorn-spotting rate is among the highest in the industry (8.06% vs. 2.5% average of other top VC firms).

The venture capital firms with the most foresight, as measured by the percent of early startups invested in that eventually became worth a billion dollars or more. Data: Pitchbook (2015)

“The problem companies can actually take up more of your time than the successful one,” says Roelof Botha of Sequoia Capital. Thus, the paradox for VCs is that although they may want to focus 80% of their time on winners, they often wind up spending more time on the long tail.

There are also two pragmatic reasons why many VCs spend most of their time with long-tail companies.

It’s impossible to know a priori which fledgling startups will become huge winners. Even the very top VCs who swing for the fences every time get it right less than 1 out of 30 times.
Most winners need less help compared to other less-successful startups. As the winners gain momentum, they are able to attract even better talent and have plenty of funding and resources to accomplish their missions.

There are many levels of high performance, and the population of companies below the “hyper performers” is distributed among “near hyper-performers” all the way down to “low performers.” There will be a large group of “high-potentials,” a group who are “potential high-potentials,” and a small group who just don’t fit at all. The power law curve reflects the idea that “we want everyone to become a hyper-performer” if they can find the right fit, and that we don’t limit people at the top of the curve — we try to create more of them.

From this point of view, it’s not hard to understand the rationale behind Fred Wilson’s decision to nurture the long tail of investments. Companies that understand this model focus very heavily on collaboration, professional development, coaching, and empowering people to do great things. For example, retailers like Costco give employees “slack time” to clean up, fix things, and rearrange the store to continuously improve the customer experience.

How to get lucky: go long on a fat tail

The economic world is driven primarily by random jumps. Yet the common tools of finance were designed for random walks in which the market always moves in baby steps. Despite increasing empirical evidence that concentration and jumps better characterize market reality, the reliance on the random walk, the bell-shaped curve, and their spawn of alphas and betas is accelerating, widening a tragic gap between reality and the standard tools of financial measurement.
— Benoit Mandelbrot and Nassim Nicholas Taleb, “How the Finance Gurus Get Risk All Wrong”

Betting against a power law return (remember Long Term Capital Management?) can result in some nasty surprises, but going long on a fat tail is a good bet, so long as you can make enough investments and be patient enough to find the rare anomaly. This way you sacrifice predictability, and that’s what venture capital is all about. (In fact, the fat tail of the power law curve is systematically under-invested in. Amazon actually generates 57% of sales from long-tail keywords, instead of the top 20% most popular keywords.)

A simple equation for a power law distribution is

where alpha defines the shape of the power law and C is a normalization constant to make the total area under the curve sum to 1. (Not to be confused with the alpha of the capital asset pricing model, in which the alpha is used to denote the amount that an investor’s prowess is different from luck.) The expression only makes sense for α > 1, which is indeed a requirement for a power-law form to normalize. Power law distributions do not have an average if alpha is less than 2, and they do not have a standard deviation if alpha is less than 3.

In power law distributions, lower alphas mean fatter tails.

The most important thing about a power law distribution is the alpha. The smaller the alpha, the heavier or fatter the right tail of the curve. Once you’ve kept alpha under 2, why not keep going? The fatter the tail, the higher the probability of outsize events. Once you’ve sacrificed predictability, you’re in for a penny, so why not be in for a pound?

Venture capitalists hold investments for an average of 4 years. They expect year-over-year growth of about 30%, meaning a continuously compounded growth rate of 26%. With these, the model gives us an alpha of (1/(.26 * 4)) + 1 = 1.96. Why do the VC alphas cluster so closely around 2, the alpha at which the mean goes to infinity? Why not even lower?

One reason is timing. If VCs have a fund life of 10 years and they invest in the first 2–3 years, they have 7–8 years to realize gains. Assuming exits are distributed exponentially,

if VCs want to exit 80% of their investments within 8 years of making them, they need to have an average time-to-exit of about 5 years;
if they want to exit 90%, they need to shorten their average time-to-exit to about 3.5 years.

This means that investing in patents — which comes with an alpha somewhere between 1.3 and 1.7 — is out, since it would take too long to realize the investment.

For a given alpha, a shorter time-to-exit requires a larger growth rate. If it takes 20 years to exit a patent (alpha = 1.5), it implies a year-over-year growth rate of about 10%. If VCs wanted to exit within 5 years, they would need a year-over-year growth rate of nearly 50%. To get to an alpha close to 2, as in venture capital, with an average time-to-exit of 5 years, the year-over-year growth rate of the portfolio companies needs to be 22% on average. For a time-to-exit of 3.5 years, the growth rate needs to be 33%. These are certainly high growth rates, and if the best VCs are the ones who can maintain the lowest alphas, then they are the ones who have the highest growth rates in their portfolios.

With finite variances and means, log normal and similar skew distributions are better behaved statistically. Still the more rightward-skewed the distribution is, whether Pareto-Levy, log normal, or some related form, the more difficult it is to hedge against risk by supporting sizable portfolios of innovation projects. The potential variability of economic outcomes with Pareto-Levy distributions is so great that large portfolio draws from year to year can have consequences for the macroeconomy.
— Frederic M. Scherer, “The Size Distribution of Profits from Innovation”

A lower alpha is better, but getting a lower alpha is constrained by finding enough companies who can generate the required amount of high growth within the time period that a VC has to go through a cycle of investing and exiting. (Coincidentally, it seems that these things balance out so close to the point where the power law distribution mean goes towards infinity.) The best explanation? Supply and demand. When alphas of less than 2 are available, i.e., the supply of high-growth companies has increased, VCs have a strong incentive to make more investments in those high-growth companies, so they raise more money and start more funds, which in turn increases the demand for those companies until the alpha returns to 2. The same old “reversion to the mean.” Just take a look at industry alphas around dramatic changes in the amount invested in VC.

At a given alpha, the more investments you make the better, because your mean return multiple increases with the number of investments, as does the likeliest highest multiple. As Dave McClure notes,

Most VC funds are far too concentrated in a small number (<20–40) of companies. The industry would be better served by doubling or tripling the average # of investments in a portfolio, particularly for early-stage investors where startup attrition is even greater. If unicorns happen only 1–2% of the time, it logically follows that portfolio size should include a minimum of 50–100+ companies in order to have a reasonable shot at capturing these elusive and mythical creatures.

McClure believes he can find hundreds of companies in the fat tail with high enough growth rates to maintain his requisite alpha. Thiel thinks it’s not possible. Venture capitalists have always faced this dilemma: The average growth rate of all small businesses in the US is closer to 7.5% than 30%. The pool of companies that can grow fast enough is rather limited. How many companies can you find that will grow meteorically, knowing that when you’re wrong about the growth rate you’re probably terribly wrong?

Implications for startup founders

Since the track record of VCs is overwhelmingly skewed by a tiny handful of “unicorns,” entrepreneurs who try to assess the reputation of VCs by only looking at home-runs may get a skewed view.

In good times, investors will be supportive. But how will they behave during bad times? Even great companies go through ups and downs. If your startup is not one of the big winners (which is likely, based on probabilities), how will your VCs behave? Will they abandon ship — or worse, will they turn negative or downright hostile?

Before taking VC funding, it’s critical for startup founders to talk not only to the winners but also — more crucially — to the long-tail companies. You may learn wisdom from “failure” much more than from success. You may discover what will do, by finding out what will not do, and discover what will be a truly meaningful investment for your company.

The unspoken truth is that the best way to make money might be to promise everyone help but then actually help the ones who are going to provide the best returns.
— Peter Thiel

For every winner, there will be many more losers. There will be skeletons in every VC’s closet (i.e., disgruntled entrepreneurs), so be realistic about how you assess VCs and what they can do for you.

At the end of the day, entrepreneurs build companies not VCs. All VCs claim to help (“add value”), but they won’t do your job for you. Furthermore, even the most supportive VCs won’t keep investing in your company at rising valuations if your company is not performing. Even if more investment is not on the table, some VCs will work with companies to the end, treating people with fairness and respect. Some VCs will not. You’ll get a much better sense for this when you talk to the long tail.

Building a startup is a long game, and you may win without VCs at all. Remember, “it is not the most intellectual of the species that survives; it is not the strongest that survives; but the species that survives is the one that is able best to adapt and adjust to the changing environment in which it finds itself.”[0]

——

[0] The quote is often attributed to Charles Darwin, but is actually from the writings of Leon C. Megginson, Professor of Management and Marketing at Louisiana State University at Baton Rouge.