Machine Learning
Topics:
Continuous Random Variables

# Continuous Random Variables

• Tutorial

In the last tutorial we have looked into discrete random variables. In this one let us look at random variables that can handle problems dealing with continuous output.

Continuous Random Variables
Def:
A continuous random variable is as function that maps the sample space of a random experiment to an interval in the real value space. A random variable is called continuous if there is an underlying function $f(x)$ such that \begin{equation} P(p \le X \le q) = \int_{p}^{q} f(x)dx \end{equation} $f(x)$ is a non-negative function called the probability density function (pdf). Pdf takes a value 0 for the values out of $range(X)$. From the rules of probability, \begin{equation} P(-\infty < X < \infty) = \int_{-\infty}^{\infty} f(x)dx = 1 \end{equation}

Note: Please note that probability mass function is different form probability density function. $f(x)$ does not give any value of probability directly hence the rules of probability do not apply for it.

Probability of the continuous random variable taking a range of values is given by the area under the curve of $f(x)$ for that range.

Example 1: $f(x) = x^2$ for continuous random variable X. What is the probability that X takes a value in $[0.5, 1]$.
\textbf{\textit{Solution:}}

Integration of $f(x)$ gives $x^3/3$ and for the given interval the probability value is 0.292

Cumulative distribution function
Cdf remains the same as in the case discrete random variables. Cdf gives the probability value of the random variable taking a value less that given value.
Def: Cumulative distribution function (cdf), denoted by $Cdf(X \le c) = \int_{-\infty}^{c} f(x)dx$
Example 2:
X is a random variable with range [2, 4] and pdf $f(x) = x/6$. What is the value of $Cdf(X \le 2.5)$.

Solution:
$Cdf(X \le 2.5) = \int_{2}^{2.5} = 0.1875$.
Complete CDF is as follows:

$a$ $Cdf(a)$
$<2$ $0$
$\le 2.5$ $0.1875$
$\le 3$ $0.4167$
$\le 3.5$ $0.6875$
$\le 4$ $1$

Note:
Unlike f(x), cdf(x) is indeed probability count and hence follows the constraint $0 \le cdf(c) \le 1$. As probability is non-negative value, cdf(x) is always non-decreasing function.
Important property is $cdf^{'}(x) = f(x)$
The value of cdf(x) goes to 0 as $x \to -\infty$ and to 1 as $x \to \infty$.
Generally a continuous random variable is denoted using its cdf function. For ex: X is an random variable with a distribution of cdf(x).

Some specific distributions

Uniform distribution

Again starting with the simplest of all distributions, X = Uniform(N) is used to model the scenarios where all the outcomes are equally possible. But the difference now is that the outcome is an interval of real values rather than discrete ones. For example, Uniform([c,d]) is when all the values of $x (c \le x \le d)$ are equally probable with a pdf of $\frac{1}{d-c}$.

Exponential distribution

Exponential distribution is defined using a parameter $\lambda$ and has a pdf $f(x) = \lambda e^{-\lambda x}, x \ge 0$.
The important point about exponential distribution is that it is used to model waiting time for an event to occur. Popular example for this is the waiting time for nuclear decay of radioactive isotope is distributed exponentially and $\lambda$ is known as the half life of isotope.
Another important aspect of this distribution is its lack of memory. When waiting time of an activity is modeled using exponential distributions, the probability of it happening in next N minutes remains same irrespective of the time passed.

Proof:
According to our above claims, we have to prove that $P(X > n + w| X > w) = P(X > n)$. \begin{equation} P(X > n + w | X > w) = \frac{P(X > n + w)}{P(X > w)} \end{equation} \begin{equation} \frac{P(X > n + w)}{P(X > w)} = \frac{e^{-\lambda (n+w)}}{e^{-\lambda w}} = e^{-\lambda n} \end{equation}

Normal Distribution

Normal or Gaussian distribution is one of the most used and important continuous distribution. It is denoted using $N(\mu,\sigma^{2})$ where $\mu$ is the mean and $\sigma^{2}$ is the variance of the given distribution. Standard normal distribution, denoted by Z is normal distribution with $mean = 0$ and $variance = 1$.

Normal distribution is used to model stats of large data sets, error measurements in data collected, etc. The interesting point about standard normal distribution is that it is symmetric about the y-axis and follows a bell curve.

$Uni(c,d)$ $Exp(\lambda), x \ge 0$ $N(\mu,\sigma^{2}), x \in {\rm I!R}$
$\frac{1}{d-c}$ $\lambda e^{-\lambda x}$ $\frac{1}{\sqrt {2\sigma^{2}\pi}}e^{\frac{-(x-\mu)^{2}}{2\sigma^{2}}}$
$\frac{x-c}{d-c}$ $1-e^{-\lambda x}$ $\frac{1}{2}[1+erf(\frac{x-\mu}{\sigma \sqrt{2}})]$

Expected value

Recap:
Expected value for a random variable gives the average or mean value calculated over all the possible outcomes of the variable. It is used to measure the centrality of the random variable.

Def:
If X is a continuous random variable that has pdf as $f(x)$ then the expected value in interval [c,d] within its range is, \begin{equation} E(X) = \int_{c}^{d} x*f(x) dx \end{equation} Expected value is often denoted by $\mu$. $f(x)dx$ denotes the probability value with which X can take the infinitesimal range of $dx$.

The following properties of expected value still hold (similar to discrete random variables):

$E(X+Y) = E(X) + E(Y)$
$E(cX + d) = c*E(X) + d$

Another additional property is, when $Y = h(X)$ where $X$ is a random variable with $f(x)$ as pdf, $E(Y) = E(h(x)) = \int_{-\infty}^{\infty} h(x)f(x)dx$

Variance and Standard deviation
\begin{equation} Var(X) = E((X-\mu)^2) \end{equation} \begin{equation} \sigma = \sqrt{Var(X)} \end{equation} Recap:

Where $\sigma$ is the called the standard deviation. Looking at Var(X) in detail, it is evident that the distance of each value from the mean is squared and it's mean is calculated. This leads to calculation of average distance of the probability mass from the mean value. Square of the distance value is taken to handle the sign of the distances calculated as we only need the magnitude.

Following properties hold for variance of a continuous random variable too,

$Var(aX+b) = a^2Var(X)$
$Var(X) = E(X^2)-(E(X))^2$
$Var(X+Y) = Var(X) + Var(Y)$ iff $X$ and $Y$ are independent.

Quantiles
One additional measure used for continuous random variable in comparision with discrete one is quantiles.

Def:
Value of $x$ for which $cdf(x) = p$ is called the $p^{th}$ quantile of $X$.
So, median value for the random variable X is the $0.5^{th}$ quantile.

?