Distributions with Thin and Fat Tails

In this post, we’ll explore a key difference between thin and fat tailed distributions, and what this means in the world of insurance and risk management. These results are from Chapter 3 of Taleb (2020).

In the world of thin tails (i.e. data that are Gaussian distributed), as the sample of data gets larger, no single data point can change the properties of the entire sample (Taleb calls this “Mediocristan”). Conversely, in data from fat tailed distributions (i.e. data from a distribution whose tail decays like a power law), a single data point (or rare events) can disproportionately impact the properties of a sample (Taleb calls this “Extremistan”). The next section explores data from a standard normal distribution, and how likely it is to observe a tail event in such data.

Gaussian Data (Mediocristan) Link to heading

Summary: In data that are normally (Gaussian) distributed, it is more likely to observe several smaller “bad” events that one massive “tail” event. For insurability, the losses need to come from many smaller bad events than a single tail event. This is true in the case of Gaussian data.

Taleb gives the example of checking the total height of two individuals randomly drawn from a sample of people (the heights are assumed to be normally distributed, or our sample is from Mediocristan). If the total height is 4.1m, (a tail event), it is more likely that the individual heights are 2.05m each than one being 10cm and the other being 4m.

Mathematically stated, if $X$ is a large value generated from a Gaussian distribution (Mediocristan), it is more likely to observe a value greater than this twice, than observing a random value that is greater than $2X$. Let us verify this by drawing a few sample values from the standard normal distribution.

import scipy.stats as dist
# standard normal distribution
st_norm = dist.norm(0,1)
# get the probability of observing a value that is more than 3 standard 
# deviations away from the mean, i.e. 3
p_x = 1 - st_norm.cdf(3)
# get the probability of observing a value that is more than 6 = 3*2 standard 
# deviations away from the mean
p_2x = 1 - st_norm.cdf(6)
# print the probability of values
print("Probability of observing two values that are 3 standard deviations \n away from the mean: {0:.3E}".format(p_x * p_x))

## Probability of observing two values that are 3 standard deviations 
##  away from the mean: 1.822E-06

print("Probability of observing a single value that is 6 standard deviations \n away from the mean: {0:.3E}".format(p_2x))

## Probability of observing a single value that is 6 standard deviations 
##  away from the mean: 9.866E-10

As expected, in the Gaussian world, it is more likely to observe a series of smaller tail events, than a single large tail event, that modifies the properties of the data sample. If these tail events are associated with losses, then for insurability, the losses need to come from many (smaller) events than a single (large) one, which leads to financial ruin. The formal conditions for this were discussed in Cramér (1959).

Data from a Pareto Distribution (Extremistan) Link to heading

Summary: In data that are from subexponential distributions (like the pareto), it is more likely to observe one massive “tail” event than several smaller “bad” events. These are not insurable, since there is a risk of catastrophe.

Let’s repeat the probability calculation exercise we did earlier, but now for the pareto distribution:

# set the shape parameter for the pareto distribution
b = 3
# get the pareto distribution object from dist, alias for scipy.stats
st_pareto = dist.pareto(b)
# get the probability of observing a value that is thrice the mean, i.e. 4.5
p_x = 1 - st_pareto.cdf(4.5)
# get the probability of observing a value that is 6 times the mean, i.e. 9
p_2x = 1 - st_pareto.cdf(9)
# print the probability of values
print("Probability of observing two values that are thrice the mean: {0:.3E}".format(p_x * p_x))

## Probability of observing two values that are thrice the mean: 1.204E-04

print("Probability of observing a single value that is 6 times the mean: {0:.3E}".format(p_2x))

## Probability of observing a single value that is 6 times the mean: 1.372E-03

Note that the pareto() method in scipy takes the shape parameter b as input, as explained here.

As we can see, the probability of observing a single large tail event is higher than observing a series of smaller tail events. If large tail events are considered to be bad and lead to financial ruin, this makes such cases uninsurable.

To drive the point home, Taleb gives another example of checking the total income of two individuals randomly drawn from a sample of people (the incomes are assumed to be subexponentially distributed, or our sample is from Extremistan). If the total income is $36 Million, (a tail event), it is more likely that the individual incomes are $35,999,000 and $1000 than both incomes being $18 Million each.

References Link to heading

Cramér, Harald. 1959. On the Mathematical Theory of Risk. Centraltryckeriet.

Taleb, Nassim Nicholas. 2020. “Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications.” arXiv Preprint arXiv:2001.10488.