ML-Fundamentals - Bias Variance Tradeoff

Introduction

If you completed the exercises simple-linear-regression, multivariate-linear-regression and logistic-linear-regression you know how to fit these models according to your training data.

This alone so far has no practical use case. The benefit of learning a model is to predict unseen data. Additionally, only with unseen data your model has not learnt from, it is possible to say if your model generalizes well or not. One way to measure this, is calculating the out of sample error EoutE_{out}, which consists of the measures bias and variance.

In this notebook you will calculate two simple hypothesis for linear regression based on training data and compare them with the use of unseen validation data by calculating EoutE_{out}, bias and variance.

Requirements

Knowledge

You should have a basic knowledge of:

  • Univariate linear regression
  • Out of sample error (bias variance)

Suitable sources for acquiring this knowledge are:

Python Modules

By deep.TEACHING convention, all python modules needed to run the notebook are loaded centrally at the beginning.

import numpy as np
import matplotlib.pyplot as plt

Exercise

Exercise inspired by lecture 8 from:

Implement the following simulation for calculating the bias and variance:

Given:

  • p(x)p(x) is unformly distributed in the interval of [0,2π][0, 2\pi]
  • The unknown target function is the sinus function, so the targets are y=sin(x)y = sin(x)
  • There is no noise on yy
  • Hypthesis H1(x)=θ0+θ1xH_1(x) = \theta_0 + \theta_1 x
  • Hypthesis H2(x)=wH_2(x) = w (a constant)

Task:

  • Do the following 10.000 times:

  • Draw two random examples x1x^1 and x2x^2 from p(x)p(x) and calculate the corrsponding yys to get (x1,y1),(x2,y2)(x^1,y^1), (x^2,y^2) (training data)

  • Using your training data calculate the parameters θ0,θ1\theta_0, \theta_1 for H1H_1 and the parameter ww for H2(x)H_2(x)

  • Numerically calculate the out of sample error EoutE_{out} for H1(x)H_1(x) and H2(x)H_2(x) for 100 data points uniformly distributed in the interval of [0,2π][0, 2\pi] (validation data)

  • Now calculate the average θ0,θ1\theta_0, \theta_1 and ww of all 10.000 experiments

  • Also calculate the mean of the out of sample error EoutE_{out} for H1(x)H_1(x) and H2(x)H_2(x)

  • Use the above to calculate the bias and the variance

  • Plot the target function sin(x)sin(x) together with both hypothesis H1(x)H_1(x) and H2(x)H_2(x) using the average θ0,θ1\theta_0, \theta_1 and ww

  • Considering your results, which hypothsis seems to better model the target function?

Practically this explanation is all you need to solve the exercise. You are free to complete it without any further guiding or by proceeding with this notebook.

Data Generation

Task:

Implement the function to draw two random training examples (xi,yi)(x^i,y^i) with:

  • xiUniform(0,2π)x^i \in Uniform(0,2\pi)
  • i1,2i \in {1,2}
  • yi=sin(xi)y^i = sin(x^i)
def train_data():
    raise NotImplementedError()
x_train, y_train  = train_data()
print(x_train, y_train)
# If your implementation is correct, these tests should not throw an exception

assert len(x_train) == 2
assert len(y_train) == 2
np.testing.assert_array_equal(np.sin(x_train), y_train)
for i in range(1000):
    x_tmp, _ = train_data()
    assert x_tmp.min() >= 0.0
    assert x_tmp.max() <= 2*np.pi

Hypothesis

For our training data we will now model two different hypothesis:

H1(x)=θ0+θ1xH_1(x) = \theta_0 + \theta_1 x

and

H2(x)=wH_2(x) = w

Task:

Implement the functions to calculate the parameters θ0,θ1\theta_0, \theta_1 for H1H_1 and ww for H2H_2 using the two drawn examples.

For later purpose (passing functions as argument) it is important that both functions accept the same amount of parameters and also return the same amount. Therefore we also pass xx to get_w, although we do not need it. And for the same reason get_thetas should return a list of two values instead of two seperate values.

def get_thetas(x, y):
    raise NotImplementedError()

def get_w(x, y):
    raise NotImplementedError()
thetas = get_thetas(x_train, y_train)
w = get_w(x_train, y_train)
print(thetas[0], thetas[1])
print(w)
# If your implementation is correct, these tests should not throw an exception

x_train_temp = np.array([0,1])
y_train_temp = np.array([np.sin(x_i) for x_i in x_train_temp])
thetas_test = get_thetas(x_train_temp, y_train_temp)
w_test = get_w(x_train_temp, y_train_temp)

np.testing.assert_almost_equal(thetas_test[0], 0.0)
np.testing.assert_almost_equal(thetas_test[1], 0.8414709848078965)
np.testing.assert_almost_equal(w_test, 0.42073549240394825)

Task:

Implement the hypothesis H1H_1 and H2H_2. Your function should return a function.

def get_hypothesis_1(thetas):
    raise NotImplementedError()
    
def get_hypothesis_2(w):
    raise NotImplementedError()
# validation data (which our model has not learnt from, but we know the labels)

x_validation = np.linspace(0, 2*np.pi, 100)
y_validation = np.sin(x_validation)
# If your implementation is correct, these tests should not throw an exception

h1_test = get_hypothesis_1(thetas_test)
h2_test = get_hypothesis_2(w_test)
np.testing.assert_almost_equal(h1_test(x_validation)[10], 0.5340523361780719)
np.testing.assert_almost_equal(h2_test(x_validation)[10], 0.42073549240394825)

Plot

Following the original exercise it is not yet necessary to plot anything. But it also does not hurt to do so, since we need to implement code for the plot anyways.

Task:

Write the function to plot:

  • the two examples (x1,y1)(x^1,y^1) and (x2,y2)(x^2,y^2)
  • the true target function sin(x)sin(x) in the interval [0,2π][0, 2 \pi].
  • the hypothesis H1H_1 in the interval [0,2π][0, 2 \pi]
  • the hypothesis H2H_2 in the interval [0,2π][0, 2 \pi]

Your plot should look similar to this one:

def plot_true_target_function_x_y_h1_h2(x, y, hypothesis1, hypothesis2):
    raise NotImplementedError()
thetas = get_thetas(x_train, y_train)
w = get_w(x_train, y_train)
plot_true_target_function_x_y_h1_h2(x_train, y_train, get_hypothesis_1(thetas), get_hypothesis_2(w))

Out of Sample Error

The out of sample error Eout(H)E_{out}(H) is the the expected value of the test error, which can be estimated with unseen validation data x,yx', y' with:

Eout(H)=Ex,y[loss(H(x),y)]1mmloss(H(x),y)E_{out}(H) = \mathbb E_{x,y}[loss(H(x),y)] \approx \frac{1}{m} \sum^m loss(H(x'), y')

Task:

Implement the function to numerically calculate the out of sample error EoutE_{out} with the mean squared error as loss function.

def out_of_sample_error(y_preds, y):
    raise NotImplementedError()
# If your implementation is correct, these tests should not throw an exception

e_out_h1_test = out_of_sample_error(h1_test(x_validation), y_validation)
np.testing.assert_almost_equal(e_out_h1_test, 11.525485917588728)

Repeat

Task:

Now instead of drawing two examples once, draw two examples 10.000 times and calculate EoutE_{out} for H1H_1 and H2H_2 given the validation data.

For each run, keep track of the following parameters and return them at the end of the function:

  • (x1,x2)(x^1,x^2)
  • (y1,y2)(y^1,y^2)
  • θ0\theta_0
  • θ1\theta_1
  • ww.
  • EoutE_{out}
def run_experiment(m, x_val, y_val):
    raise NotImplementedError()
xs, ys, t0s, t1s, ws, e_out_h1s, e_out_h2s = run_experiment(
    10000, x_validation, y_validation)

Average and Plot

Now we can calculate the average of θ0,θ1\theta_0, \theta_1, ww and EoutE_{out} and already plot the resulting averaged H~1\tilde H_1 and H~2\tilde H_2 together with the target function sin(x)sin(x).

Your plot should look similar to the one below:

t0_avg = t0s.mean()
t1_avg = t1s.mean()
thetas_avg = [t0_avg, t1_avg]
w_avg = ws.mean()
h1_avg = get_hypothesis_1(thetas_avg)
h2_avg = get_hypothesis_2(w_avg)
plot_true_target_function_x_y_h1_h2([], [], h1_avg, h2_avg)
expectation_Eout_1 = e_out_h1s.mean()
print ("expectation of E_out of model 1:", expectation_Eout_1)
expectation_Eout_2 = e_out_h2s.mean()
print ("expectation of E_out of model 2:", expectation_Eout_2)

Bias

The bias is defined as:

bias=Ex[(H~(x)y)2]bias = \mathbb E_x \left[(\tilde H(x') - y')^2\right]

with:

  • the average hypothesis H~\tilde H and it's output on the unseen data H~(x)\tilde H(x')
  • yy' the ground truth (true) labels for xx'

Task:

Implement the function to calculate the mean bias .

def bias(y_true, y_predicted):
    raise NotImplementedError()
bias_1 = bias(y_validation,  h1_avg(x_validation))
print ("Bias of model 1:", bias_1)
bias_2 = bias(y_validation,  h2_avg(x_validation))
print ("Bias of model 2:", bias_2)

Variance

Variance:

variance=Ex[ED[(HD(x)H~(x))2]]variance = \mathbb E_x \left[ \mathbb E_D \left[(H^D(x') - \tilde H(x'))^2 \right] \right]

with:

  • the average hypothesis H~\tilde H and it's output on the unseen data H~(x)\tilde H(x')
  • the learnt hypothesis HDH^D for training data DD. DD here equals to xx and yy without the '

Task:

Implement the function to calculate the variances for each of the 10.000 experiements and return them as list or array.

Now we benefit from our implementation of get_w, get_thetas, respectively get_hypothesis1,get_hypothesis2, which accept and return the same amount of parameters, so we can write a generalized function.

def variances(hypothesis_func, param_func, xs, ys, x_val, y_preds):
    return NotImplementedError()
vars_1 = variances(get_hypothesis_1, 
                 get_thetas, 
                 xs, ys, 
                 x_validation, 
                 h1_avg(x_validation))
var_1_avg = vars_1.mean()
print(var_1_avg)
vars_2 = variances(get_hypothesis_2, 
                 get_w, 
                 xs, ys, 
                 x_validation, 
                 h2_avg(x_validation)).mean()
var_2_avg = vars_2.mean()
print(var_2_avg)
print("model 1: E_out ≈ bias + variance:  %f ≈ %f + %f" % (expectation_Eout_1, bias_1, var_1_avg))
print("model 2: E_out ≈ bias + variance:  %f ≈ %f + %f" % (expectation_Eout_2, bias_2, var_2_avg))

Summary and Outlook

[TODO]

Licenses

Notebook License (CC-BY-SA 4.0)

The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).

Exercise: Bias Variance Tradeoff
by Christian Herta, Klaus Strohmenger
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://gitlab.com/deep.TEACHING.

Code License (MIT)

The following license only applies to code cells of the notebook.

Copyright 2018 Christian Herta, Klaus Strohmenger

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.