Exercise  Probabilistic Rankings
Table of Contents
Introduction
In this assignment, you’ll be using the (binary) results of the 2011 ATP men’s tennis singles for 107 players in a total of 1801 games (which these players played against each other in the 2011 season), to compute probabilistic rankings of the skills of these players.
Remark: In order to detect errors in your own code, execute the notebook cells containing assert
or assert_almost_equal
. These statements raise exceptions, as long as the calculated result is not yet correct.
Requirements
Knowledge
Python Modules
import numpy as np
import pymc3 as pm
import theano
from theano import tensor as T
from matplotlib import pyplot as plt
from IPython.core.pylabtools import figsize
%matplotlib inline
Exercises
Data
If you have not cloned the whole git directory. Download the files:

https://gitlab.com/deep.TEACHING/educationalmaterials/blob/master/datasets/tennis_games.npy

https://gitlab.com/deep.TEACHING/educationalmaterials/blob/master/datasets/tennis_players.npy
and adjust the paths.
tennis_players = np.load("../../../../../datasets/tennis_players.npy")
nb_tennis_players = len(tennis_players)
tennis_games = np.load("../../../../../datasets/tennis_games.npy")
tennis_games.shape
tennis_games is a 1801 by 2 matrix of the played games, one row per game: the first column is the identity of the player who won the game, and the second column contains the identity of the player who lost.
tennis_games
Task:
 Use pymc to develop a ranking system.
 Plot the ranking accoring to your (learnt) model.

Write a function which get's as input the ids of two player and prints (or returns) a prediction of the probabilities that player 1 resp. player 2 wins. e.g.:
> print_prediction(10, 12) AndyMurray: 0.56 DavidNalbandian: 0.44
results = np.ndarray([len(tennis_games), 3], dtype="int32")
results[:,0:2] = tennis_games
results[:,2]=1
results
pos = np.arange(nb_tennis_players)+.5
plt.figure(figsize=(10,50))
plt.barh(pos, skills_mean, align='center')
plt.yticks(pos, tennis_players)
plt.ylim(0, nb_tennis_players)
plt.xlabel('Performance')
plt.title('Scoring of the tennis players.')
plt.grid(True)
def get_scores(skills_mean):
scores = dict()
# mean of skill
for i, name in enumerate(tennis_players):
scores[name] = skills_mean[i]
sorted_scores = sorted(scores.items(), key=lambda k: k[1], reverse=True)
return sorted_scores
sorted_scores = get_scores(skills_mean)
def print_scoring(sorted_scores):
for i in sorted_scores:
print (u'{:30s} {:2.3f}'.format(i[0], i[1]))
print_scoring(sorted_scores)
# probability that player 10 wins against player 13
print_prediction_on_full_trace(10, 13, trace)
