Probability Distributions that every Data Scientist must know

Introduction

Probability of an event tells us how likely is that, the event will occur. The applications of probability begin with the numbers p0, p1, p2… that give the probability of each possible outcome.

There are dozens of famous and useful possibilities for p. I will discuss four of them in this post. Before going deep into the probability distributions one must be aware of Random Variables. A random variable is a variable whose values depend on outcomes of a random event. There are two types of Random Variables,

  1. Discrete Random Variable: A Random Variable which takes a finite or countable numbers of distinct values.
  2. Continuous Random Variable: A Random Variable which takes infinite number of values, basically values which are continuous in nature.

In the next section, we will start discussing the different probability distributions. I have also attached the links for the PDFs, PMFs and the graphs of these distributions in the References section below.

1. Binomial Distribution

In Binomial Distribution the outcome for each trial is either 0 or 1 (success or failure, heads or tails). The probability of success is given by p and the probability of failure is given by 1-p. If we take an example of a coin toss then, a fair coin has probability of heads and tails as 1/2 each. There can be n number of trials. For example, tossing a fair coin 10 times would probably form a Binomial Distribution.

2. Poisson Distribution

The Poisson Distribution is fascinating , because there are different ways to approach it. One way is to connect it directly to the binomial distribution (probability p of success in each trial and pk,n for k successes in n trials. Then Poisson explains this limiting situation of rare events but many trials with λ successes where,

  1. p –> 0 i.e the success probability p is small for each trial (tending to zero).
  2. n –> ∞ i.e the number of trials n is very large (tending to infinity).
  3. np = λ i.e the average (expected) number of successes in n trials is λ which is constant.

The Poisson Distribution is associated with rare events. Few examples of the events which follow the Poisson Distribution are,

  1. The number of big meteors striking the Earth.
  2. The number of campers attacked by mountain lions.
  3. The number of failures of big banks.

One important point to note here is that the Poisson Distribution assumes the independence of events. Some examples above might not fit this assumption. For example, failure of one bank may be linked to failure of other banks as well. The assumption of IID (Independent and Identically distributed) is not always true.

3. Exponential Distribution

The first continuous distribution in this post is the Exponential Distribution. It is the probability distribution of the time between events in a Poisson point process. For example, How long until lightning strikes your city? or How long until a big meteor strikes the earth?

Note: The waiting time is independent of the time you have already waited.

The future is independent of the past. If the failure comes from a slow decay, this assumption of independence will not hold.

4. Normal Distribution

The distribution which forms the center of probability theory is the Normal Distribution. It is also known as the Gaussian Distribution. It is the famous bell shaped curve. The Normal Distribution can be produced by taking random samples from any distribution and creating a distribution from the average of these samples. This is the basis for the Central Limit Theorem. The example of an event which follows a Normal Distribution can be the distribution of heights of all people in your city.

References

  1. https://www.deeplearningbook.org/contents/prob.html
  2. https://en.wikipedia.org/wiki/Probability_distribution
  3. https://en.wikipedia.org/wiki/List_of_probability_distributions
  4. Probability Mass Functions and Probability Density Functions for the distributions discussed above.
    1. Binomial Distribution
    2. Poisson Distribution
    3. Exponential Distribution
    4. Normal Distribution

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: