# Exercise - Univariate Gaussian Basics

## Introduction

The normal distribution, also Gaussian distribution, is a distribution, which you can encounter endless times in a lot of different domains. This is because of the central limit theorem (CLT): When you draw $n$ random and independant variables from a distribution (e.g. rolling a dice $[1,6]$ 10 times, flipping a coin $[0,1]$ 10 times, etc...) and you calculate the mean or the sum of your sample, then the mean (or sum) will converge to a Gaussian distribution if you repeat this process several times.

Furthermore, the Gaussian has a very convenient PDF since we only need two parameters to describe it:

• the variance $\sigma^2$ and
• the mean $\mu$.

Remark: In order to detect errors in your own code, execute the notebook cells containing assert or assert_almost_equal. These statements raise exceptions, as long as the calculated result is not yet correct.

## Requirements

### Knowledge

To complete this exercise notebook, you should possess knowledge about the following topics.

• Univariate Gaussian
• Empirical mean
• Variance / sample variance The following material can help you to acquire this knowledge:
• Gaussian, variance, mean:
• Chapter 3 of the Deep Learning Book [GOO16]
• Chapter 1 of the book Pattern Recognition and Machine Learning by Christopher M. Bishop [BIS07]
• Univariate gaussian:
• Video1 and the follwoing of Khan Academy [KHA18a]
• Sample variance:
• Video2 and the follwoing of Khan Academy [KHA18b]

### Python Modules

# External Modules
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats

%matplotlib inline

## Exercises

From an experiment we obtain a size $N$ random sample ${\bf x}_1, \dots, {\bf x}_N$ from a Gaussian distribution:

$P(x\mid\mu,\sigma) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$

with:

• the mean $\mu$
• the standard deviation $\sigma$
mu = -1.5
sigma = 3
sigma_square = sigma**2
size = 10
def plot_gaussian_pdf(mu, sigma):
x = np.linspace(mu - 4*sigma, mu + 4*sigma, 100)
plt.plot(x,stats.norm.pdf(x, mu, sigma))

plot_gaussian_pdf(mu, sigma)
def get_data(mu, sigma_square, size):
sigma = np.sqrt(sigma_square)
x = np.random.normal(loc=mu, scale=sigma, size=size)
return x
x = get_data(mu, sigma_square, size)
x
def plot_hist(x):
fig = plt.figure(figsize=(12,5), dpi=80)
ax.hist(x, bins=3)
ax.set_xlabel(r'$x$', fontsize=20)
ax.set_ylabel(r'$c$', fontsize=20)
ax.set_title("Histogram of sample x", fontsize=20)

plot_hist(x)

### Exercise - Empirical Mean

To calculate the empricial mean $\hat{\mu}$ for a data set $\mathcal D = \{x_1, x_2, \dots x_N \}$:

$\hat{\mu} = \frac{1}{N} \sum_{x_i}^N x_i$

with

-$N$: Number of dat points

Implement the function to calculate the emprical mean without the use of the function np.mean.

def mean(x):
""" Calculates the mean of x """
raise NotImplementedError
np.testing.assert_almost_equal(mean(x), x.mean())
print(mean(x))

### Exercise - Sample Variance

Implement the function to calculate the sample variance with:

$\hat \sigma_{N}^2 = \frac{1}{N} \sum_{i=1}^N \left( x_i - \hat{\mu} \right)^2$

resp.

$\hat \sigma_{N-1}^2 = \frac{1}{N-1} \sum_{i=1}^N \left( x_i - \hat{\mu} \right)^2$

Your function should be able to handle both cases (and any $N-a$, $a \in [0,N[$), depending on the ddof argument (delta degrees of freedom). If the argument mean_ is None use the empirical mean.

def var(x, mean_=None, ddof=0):
""" Calculates the variance of x

:x: sample
:x type: numpy array type float
:mean_: mean to use for the calculation.
if mean=None, the empirical mean of x
will be used
:mean_ type: float or None
:ddof: delat degrees of freedom
:ddof type: integer

:return: the variance of x
:r type: float

"""
raise NotImplementedError
np.testing.assert_almost_equal(var(x, ddof=1), np.var(x, ddof=1))
np.testing.assert_almost_equal(var(x, ddof=0), np.var(x, ddof=0))

1. Sample $m$ such data sets and compute the estimator for the variance $\sigma^2$ with ddof=0 and ddof=1:
2. From the results of your simulation conclude which estimator $\hat \sigma_N$ or $\hat \sigma_{N-1}$ is a biased resp. unbiased estimator?
def get_sigma_square_estimate(m, mu, sigma_square, size, ddof=0):
""" Estimates the variance of m Gaussian samples
using their empirical variance

:m: number of samples
:m type: integer
:mu: mean of the Gaussian
:mu type: float
:size: size of each sample
:size type: unsigned integer
:sigma_square: sigma_square (variance) of the gaussian
:sigma_square type: float
:ddof: delat degrees of freedom
:ddof type: integer

:return: estimated variance
:r type: float
"""
raise NotImplementedError
m = 100000
print("ddof=0: var:\t", get_sigma_square_estimate(m, mu, sigma_square, size, ddof=0))
print("ddof=1: var:\t", get_sigma_square_estimate(m, mu, sigma_square, size, ddof=1))

## Literature

The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).

Exercise - Multivariate Gaussian
by Christian Herta, Klaus Strohmenger