2 - 7) EDA - Elo rating information

46 minute read

import pandas as pd 
import numpy as np  

import matplotlib.pyplot as plt 
import seaborn as sns 

import sqlite3 as sql 
import requests
import csv 
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)

Elo rating

  • The World Football Elo Ratings are a ranking system for men’s national association football teams that is published by the website eloratings.net. It is based on the Elo rating system but includes modifications to take various football-specific variables into account, like the margin of victory, importance of a match, and home field advantage. Other implementations of the Elo rating system are possible and there is no single nor any official Elo ranking for football teams. (https://en.wikipedia.org/wiki/World_Football_Elo_Ratings#cite_note-Laesk_et._al.-1)
  • Since being developed, the Elo rankings have been found to have the highest predictive capability for football matches. (J. Lasek, Z. Szlávik and S. Bhulai (2013), The predictive power of ranking systems in association football, Archived 26 July 2014 at the Wayback Machine, Int. J. Applied Pattern Recognition1: 27–46.)
  • The Elo system with a few modifications (home field advantage, goal difference, inter-league adjustments) is used to determine the club’s Elo values.
  • So let’s use the Elo rating for soccer club to predict the match results.

1. Get the Elo rating data

  • Let’s get the Elo rating information by the API request from the ClubELO website. (http://clubelo.com/API)
  • Since we want to know the Elo rating before every match date, we request the Elo rating of the day before every match date.
df_match_basic = pd.read_csv("../data/df_match_basic.csv")
df_team = pd.read_csv("../data/df_team.csv")
elo_target_date = (pd.to_datetime(df_match_basic.match_date.unique()) - timedelta(days = 1)).strftime("%Y-%m-%d")
elo_rating = pd.DataFrame(columns = ["Rank", "Club", "Country", "Level", "Elo", "From", "To", "elo_target_date", "match_date"])
for target_date in elo_target_date:
    api_url = f"http://api.clubelo.com/{target_date}"
    
    response = requests.get(api_url)
    decoded_content = response.content.decode("utf-8")
    cr = csv.reader(decoded_content.splitlines(), delimiter = ",")
    
    my_list = list(cr)
    
    target_date_elo_rating = pd.DataFrame(my_list[1:], columns = my_list[0])
    target_date_elo_rating["elo_target_date"] = target_date
    target_date_elo_rating["match_date"] = (pd.to_datetime(target_date) + timedelta(days = 1)).strftime("%Y-%m-%d")
    
    elo_rating = pd.concat([elo_rating, target_date_elo_rating])
elo_rating = elo_rating[~elo_rating.Club.isna()]
  • Since team names from the elo rating are different with our data’s club names, let’s match the club names.
## England Premier League

elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Arsenal"), "team_api_id"] = 9825
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Aston Villa"), "team_api_id"] = 10252
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Birmingham"), "team_api_id"] = 8658
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Blackburn"), "team_api_id"] = 8655
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Blackpool"), "team_api_id"] = 8483
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Bolton"), "team_api_id"] = 8559
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Bournemouth"), "team_api_id"] = 8678
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Burnley"), "team_api_id"] = 8191
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Cardiff"), "team_api_id"] = 8344
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Chelsea"), "team_api_id"] = 8455
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Crystal Palace"), "team_api_id"] = 9826
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Everton"), "team_api_id"] = 8668
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Fulham"), "team_api_id"] = 9879
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Hull"), "team_api_id"] = 8667
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Leicester"), "team_api_id"] = 8197
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Liverpool"), "team_api_id"] = 8650
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Man City"), "team_api_id"] = 8456
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Man United"), "team_api_id"] = 10260
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Middlesbrough"), "team_api_id"] = 8549
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Newcastle"), "team_api_id"] = 10261
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Norwich"), "team_api_id"] = 9850
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Portsmouth"), "team_api_id"] = 8462
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "QPR"), "team_api_id"] = 10172
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Reading"), "team_api_id"] = 9798
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Southampton"), "team_api_id"] = 8466
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Stoke"), "team_api_id"] = 10194
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Sunderland"), "team_api_id"] = 8472
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Swansea"), "team_api_id"] = 10003
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Tottenham"), "team_api_id"] = 8586
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Watford"), "team_api_id"] = 9817
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "West Brom"), "team_api_id"] = 8659
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "West Ham"), "team_api_id"] = 8654
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Wigan"), "team_api_id"] = 8528
elo_rating.loc[(elo_rating.Country == "ENG") & (elo_rating.Club == "Wolves"), "team_api_id"] = 8602
## Germany 1. Bundesliga

elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Koeln"), "team_api_id"] = 8722
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Nuernberg"), "team_api_id"] = 8165
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Mainz"), "team_api_id"] = 9905
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Leverkusen"), "team_api_id"] = 8178
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Dortmund"), "team_api_id"] = 9789
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Bielefeld"), "team_api_id"] = 9912
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Braunschweig"), "team_api_id"] = 9776
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Frankfurt"), "team_api_id"] = 9810
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Augsburg"), "team_api_id"] = 8406
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Bayern"), "team_api_id"] = 9823
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Cottbus"), "team_api_id"] = 8398
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Ingolstadt"), "team_api_id"] = 8234
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Schalke"), "team_api_id"] = 10189
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "St Pauli"), "team_api_id"] = 8152
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Duesseldorf"), "team_api_id"] = 8194
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Hamburg"), "team_api_id"] = 9790
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Hannover"), "team_api_id"] = 9904
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Hertha"), "team_api_id"] = 8177
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Karlsruhe"), "team_api_id"] = 8295
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Freiburg"), "team_api_id"] = 8358
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Paderborn"), "team_api_id"] = 8460
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Darmstadt"), "team_api_id"] = 8262
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Werder"), "team_api_id"] = 8697
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Fuerth"), "team_api_id"] = 8357
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Hoffenheim"), "team_api_id"] = 8226
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Stuttgart"), "team_api_id"] = 10269
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Bochum"), "team_api_id"] = 9911
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Wolfsburg"), "team_api_id"] = 8721
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Gladbach"), "team_api_id"] = 9788
elo_rating.loc[(elo_rating.Country == "GER") & (elo_rating.Club == "Lautern"), "team_api_id"] = 8350
## Poland Ekstraklasa

elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Arka"), "team_api_id"] = 8322
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Cracovia"), "team_api_id"] = 2186
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Belchatow"), "team_api_id"] = 8569
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Gornik"), "team_api_id"] = 8019
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Jagiellonia"), "team_api_id"] = 1957
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Korona"), "team_api_id"] = 8245
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Lech"), "team_api_id"] = 2182
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Lechia"), "team_api_id"] = 8030
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Legia"), "team_api_id"] = 8673
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Odra Wodzislaw"), "team_api_id"] = 8242
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Polonia Warszawa"), "team_api_id"] = 2183
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Piast Gliwice"), "team_api_id"] = 8028
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Podbeskidzie"), "team_api_id"] = 8033
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Pogon"), "team_api_id"] = 8023
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Bytom"), "team_api_id"] = 8020
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Ruch"), "team_api_id"] = 1601
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Widzew"), "team_api_id"] = 8024
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Wisla"), "team_api_id"] = 10265
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Lubin"), "team_api_id"] = 8021
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Zawisza"), "team_api_id"] = 8027
elo_rating.loc[(elo_rating.Country == "POL") & (elo_rating.Club == "Slask"), "team_api_id"] = 8025
## Belgium Jupiler League

elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Beerschot AC"), "team_api_id"] = 9993
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Brugge"), "team_api_id"] = 8342
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Gent"), "team_api_id"] = 9991
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Eupen"), "team_api_id"] = 6351
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Genk"), "team_api_id"] = 9987
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Cercle Brugge"), "team_api_id"] = 9984
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Roeselare"), "team_api_id"] = 9999
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Kortrijk"), "team_api_id"] = 8571
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Mechelen"), "team_api_id"] = 8203
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Oostende"), "team_api_id"] = 8573
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Westerlo"), "team_api_id"] = 10001
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Lierse"), "team_api_id"] = 9989
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Leuven"), "team_api_id"] = 1773
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Anderlecht"), "team_api_id"] = 8635
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Mouscron"), "team_api_id"] = 9996
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Zulte Waregem"), "team_api_id"] = 10000
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "St Truiden"), "team_api_id"] = 9997
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Charleroi"), "team_api_id"] = 9986
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Lokeren"), "team_api_id"] = 9994
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Standard"), "team_api_id"] = 9985
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Beveren"), "team_api_id"] = 8475
elo_rating.loc[(elo_rating.Country == "BEL") & (elo_rating.Club == "Bergen"), "team_api_id"] = 9998
## Spain LIGA BBVA

elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Bilbao"), "team_api_id"] = 8315
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Atletico"), "team_api_id"] = 9906
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Osasuna"), "team_api_id"] = 8371
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Numancia"), "team_api_id"] = 8388
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Tenerife"), "team_api_id"] = 9867
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Cordoba"), "team_api_id"] = 7869
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Elche"), "team_api_id"] = 10268
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Barcelona"), "team_api_id"] = 8634
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Getafe"), "team_api_id"] = 8305
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Granada"), "team_api_id"] = 7878
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Hercules"), "team_api_id"] = 10278
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Levante"), "team_api_id"] = 8581
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Malaga"), "team_api_id"] = 9864
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Celta"), "team_api_id"] = 9910
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Depor"), "team_api_id"] = 9783
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Recreativo"), "team_api_id"] = 8479
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Espanyol"), "team_api_id"] = 8558
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Mallorca"), "team_api_id"] = 8661
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Santander"), "team_api_id"] = 8696
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Rayo Vallecano"), "team_api_id"] = 8370
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Betis"), "team_api_id"] = 8603
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Real Madrid"), "team_api_id"] = 8633
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Sociedad"), "team_api_id"] = 8560
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Gijon"), "team_api_id"] = 9869
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Valladolid"), "team_api_id"] = 10281
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Zaragoza"), "team_api_id"] = 8394
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Eibar"), "team_api_id"] = 8372
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Sevilla"), "team_api_id"] = 8302
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Almeria"), "team_api_id"] = 9865
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Las Palmas"), "team_api_id"] = 8306
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Valencia"), "team_api_id"] = 10267
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Villarreal"), "team_api_id"] = 10205
elo_rating.loc[(elo_rating.Country == "ESP") & (elo_rating.Club == "Xerez"), "team_api_id"] = 9868
## Netherlands Eredivisie

elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Den Haag"), "team_api_id"] = 10217
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Alkmaar"), "team_api_id"] = 10229
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Ajax"), "team_api_id"] = 8593
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "De Graafschap"), "team_api_id"] = 8526
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Excelsior"), "team_api_id"] = 10218
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Dordrecht"), "team_api_id"] = 6631
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Groningen"), "team_api_id"] = 8674
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Twente"), "team_api_id"] = 8611
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Utrecht"), "team_api_id"] = 9908
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Feyenoord"), "team_api_id"] = 10235
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Go Ahead Eagles"), "team_api_id"] = 6433
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Heracles"), "team_api_id"] = 9791
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Nijmegen"), "team_api_id"] = 8464
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Breda"), "team_api_id"] = 9761
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Zwolle"), "team_api_id"] = 6413
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "PSV"), "team_api_id"] = 8640
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Waalwijk"), "team_api_id"] = 10219
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Roda"), "team_api_id"] = 9803
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Cambuur"), "team_api_id"] = 7788
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Heerenveen"), "team_api_id"] = 10228
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Sparta Rotterdam"), "team_api_id"] = 8614
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Venlo"), "team_api_id"] = 9839
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Vitesse"), "team_api_id"] = 8277
elo_rating.loc[(elo_rating.Country == "NED") & (elo_rating.Club == "Willem II"), "team_api_id"] = 8525
## Portugal Liga ZON Sagres

elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Academica"), "team_api_id"] = 10215
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Boavista"), "team_api_id"] = 8613
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Nacional"), "team_api_id"] = 10214
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Belenenses"), "team_api_id"] = 9807
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Maritimo"), "team_api_id"] = 10212
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Estoril"), "team_api_id"] = 7842
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Arouca"), "team_api_id"] = 158085
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Pacos Ferreira"), "team_api_id"] = 6403
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Penafiel"), "team_api_id"] = 6547
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Porto"), "team_api_id"] = 9773
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Gil Vicente"), "team_api_id"] = 9764
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Leixoes"), "team_api_id"] = 6421
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Moreirense"), "team_api_id"] = 8348
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Naval"), "team_api_id"] = 9809
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Rio Ave"), "team_api_id"] = 7841
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Olhanense"), "team_api_id"] = 2033
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Beira Mar"), "team_api_id"] = 10211
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Braga"), "team_api_id"] = 10264
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Benfica"), "team_api_id"] = 9772
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Sporting"), "team_api_id"] = 9768
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Leiria"), "team_api_id"] = 9771
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Guimaraes"), "team_api_id"] = 7844
elo_rating.loc[(elo_rating.Country == "POR") & (elo_rating.Club == "Setubal"), "team_api_id"] = 10238
## Italy Serie A

elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Atalanta"), "team_api_id"] = 8524
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Bari"), "team_api_id"] = 9976
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Bologna"), "team_api_id"] = 9857
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Brescia"), "team_api_id"] = 9858
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Cagliari"), "team_api_id"] = 8529
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Carpi"), "team_api_id"] = 208931
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Catania"), "team_api_id"] = 8530
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Cesena"), "team_api_id"] = 9880
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Chievo"), "team_api_id"] = 8533
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Empoli"), "team_api_id"] = 8534
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Fiorentina"), "team_api_id"] = 8535
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Frosinone"), "team_api_id"] = 9891
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Genoa"), "team_api_id"] = 10233
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Verona"), "team_api_id"] = 9876
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Inter"), "team_api_id"] = 8636
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Juventus"), "team_api_id"] = 9885
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Lazio"), "team_api_id"] = 8543
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Lecce"), "team_api_id"] = 9888
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Livorno"), "team_api_id"] = 8537 
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Milan"), "team_api_id"] = 8564
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Napoli"), "team_api_id"] = 9875
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Novara"), "team_api_id"] = 6269
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Palermo"), "team_api_id"] = 8540
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Parma"), "team_api_id"] = 10167
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Pescara"), "team_api_id"] = 9878
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Reggina"), "team_api_id"] = 8690
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Roma"), "team_api_id"] = 8686
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Sampdoria"), "team_api_id"] = 9882
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Sassuolo"), "team_api_id"] = 7943
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Siena"), "team_api_id"] = 8551
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Torino"), "team_api_id"] = 9804
elo_rating.loc[(elo_rating.Country == "ITA") & (elo_rating.Club == "Udinese"), "team_api_id"] = 8600
## France Ligue 1

elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Ajaccio"), "team_api_id"] = 8576
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Arles-Avignon"), "team_api_id"] = 108893
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Auxerre"), "team_api_id"] = 8583
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Monaco"), "team_api_id"] = 9829
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Nancy"), "team_api_id"] = 8481
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Saint-Etienne"), "team_api_id"] = 9853
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Angers"), "team_api_id"] = 8121
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Dijon"), "team_api_id"] = 9836
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Troyes"), "team_api_id"] = 10242
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Guingamp"), "team_api_id"] = 9747
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Lorient"), "team_api_id"] = 8689
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Metz"), "team_api_id"] = 8550
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Nantes"), "team_api_id"] = 9830
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Sochaux"), "team_api_id"] = 9874
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Gazelec"), "team_api_id"] = 6391
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Bordeaux"), "team_api_id"] = 9827
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Grenoble"), "team_api_id"] = 9855
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Lille"), "team_api_id"] = 8639
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Le Havre"), "team_api_id"] = 9746
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Le Mans"), "team_api_id"] = 8682
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Montpellier"), "team_api_id"] = 10249
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Nice"), "team_api_id"] = 9831
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Lyon"), "team_api_id"] = 9748
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Marseille"), "team_api_id"] = 8592
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Paris SG"), "team_api_id"] = 9847
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Lens"), "team_api_id"] = 8588
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Bastia"), "team_api_id"] = 7794
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Caen"), "team_api_id"] = 7819
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Brest"), "team_api_id"] = 8521
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Rennes"), "team_api_id"] = 9851
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Reims"), "team_api_id"] = 9837
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Toulouse"), "team_api_id"] = 9941
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Boulogne"), "team_api_id"] = 4170
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Valenciennes"), "team_api_id"] = 9873
elo_rating.loc[(elo_rating.Country == "FRA") & (elo_rating.Club == "Evian TG"), "team_api_id"] = 4087
## Scotland Premier League

elo_rating.loc[(elo_rating.Country == "SCO") & (elo_rating.Club == "Aberdeen"), "team_api_id"] = 8485
elo_rating.loc[(elo_rating.Country == "SCO") & (elo_rating.Club == "Celtic"), "team_api_id"] = 9925
elo_rating.loc[(elo_rating.Country == "SCO") & (elo_rating.Club == "Dundee"), "team_api_id"] = 8284
elo_rating.loc[(elo_rating.Country == "SCO") & (elo_rating.Club == "Dundee United"), "team_api_id"] = 9938
elo_rating.loc[(elo_rating.Country == "SCO") & (elo_rating.Club == "Dunfermline"), "team_api_id"] = 8457
elo_rating.loc[(elo_rating.Country == "SCO") & (elo_rating.Club == "Falkirk"), "team_api_id"] = 8596
elo_rating.loc[(elo_rating.Country == "SCO") & (elo_rating.Club == "Hamilton"), "team_api_id"] = 8429
elo_rating.loc[(elo_rating.Country == "SCO") & (elo_rating.Club == "Hearts"), "team_api_id"] = 9860
elo_rating.loc[(elo_rating.Country == "SCO") & (elo_rating.Club == "Hibernian"), "team_api_id"] = 10251
elo_rating.loc[(elo_rating.Country == "SCO") & (elo_rating.Club == "Inverness"), "team_api_id"] = 8066
elo_rating.loc[(elo_rating.Country == "SCO") & (elo_rating.Club == "Kilmarnock"), "team_api_id"] = 8597
elo_rating.loc[(elo_rating.Country == "SCO") & (elo_rating.Club == "Motherwell"), "team_api_id"] = 9927
elo_rating.loc[(elo_rating.Country == "SCO") & (elo_rating.Club == "Partick"), "team_api_id"] = 8426
elo_rating.loc[(elo_rating.Country == "SCO") & (elo_rating.Club == "Rangers"), "team_api_id"] = 8548
elo_rating.loc[(elo_rating.Country == "SCO") & (elo_rating.Club == "Ross County"), "team_api_id"] = 8649
elo_rating.loc[(elo_rating.Country == "SCO") & (elo_rating.Club == "St Johnstone"), "team_api_id"] = 8467
elo_rating.loc[(elo_rating.Country == "SCO") & (elo_rating.Club == "St Mirren"), "team_api_id"] = 9800
## Switzerland Super League

elo_rating.loc[(elo_rating.Country == "SUI") & (elo_rating.Club == "Aarau"), "team_api_id"] = 9930
elo_rating.loc[(elo_rating.Country == "SUI") & (elo_rating.Club == "Basel"), "team_api_id"] = 9931
elo_rating.loc[(elo_rating.Country == "SUI") & (elo_rating.Club == "Bellinzona"), "team_api_id"] = 6493
elo_rating.loc[(elo_rating.Country == "SUI") & (elo_rating.Club == "Grasshoppers"), "team_api_id"] = 9956
elo_rating.loc[(elo_rating.Country == "SUI") & (elo_rating.Club == "Lausanne"), "team_api_id"] = 7730
elo_rating.loc[(elo_rating.Country == "SUI") & (elo_rating.Club == "Luzern"), "team_api_id"] = 10199
elo_rating.loc[(elo_rating.Country == "SUI") & (elo_rating.Club == "Servette"), "team_api_id"] = 9777
elo_rating.loc[(elo_rating.Country == "SUI") & (elo_rating.Club == "Sion"), "team_api_id"] = 10179
elo_rating.loc[(elo_rating.Country == "SUI") & (elo_rating.Club == "StGallen"), "team_api_id"] = 10190
elo_rating.loc[(elo_rating.Country == "SUI") & (elo_rating.Club == "Thun"), "team_api_id"] = 10191
elo_rating.loc[(elo_rating.Country == "SUI") & (elo_rating.Club == "Xamax"), "team_api_id"] = 7955
elo_rating.loc[(elo_rating.Country == "SUI") & (elo_rating.Club == "Young Boys"), "team_api_id"] = 10192
elo_rating.loc[(elo_rating.Country == "SUI") & (elo_rating.Club == "Zuerich"), "team_api_id"] = 10243
elo_rating.loc[(elo_rating.Country == "LIE") & (elo_rating.Club == "Vaduz"), "team_api_id"] = 9824
  • Save the elo_rating table.
elo_rating.to_csv("../data/elo_rating.csv", index = False)
  • Let’s calculate the home & away Elo rating difference of each match that will be used for modeling later.
  • First, get home teams’s Elo rating.
match_api_id = df_match_basic[["match_api_id", "match_date", "home_team_api_id", "away_team_api_id"]]
df_match_elo = match_api_id.rename(columns = {"home_team_api_id": "team_api_id"}).merge(elo_rating[["team_api_id", "match_date", "Elo"]], how = "left", on = ["team_api_id", "match_date"]).rename(columns = {"Elo": "home_elo", "team_api_id": "home_team_api_id"})
df_match_elo.isna().sum()
match_api_id          0
match_date            0
home_team_api_id      0
away_team_api_id      0
home_elo            269
dtype: int64
  • Some home teams don’t have Elo information for some match dates.
  • We will impute this missing values by filling most recent Elo rating of corresponding teams.
target_team_id = df_match_elo[(df_match_elo.home_elo.isna())][["home_team_api_id"]].drop_duplicates()
valid_target_team_id = target_team_id[target_team_id.team_api_id.isin(df_team.team_api_id.unique())]
target_club_name = elo_rating[elo_rating.team_api_id.isin(valid_target_team_id.team_api_id)].Club.unique()
target_club_all_elo = pd.DataFrame(columns = ["Rank", "Club", "Country", "Level", "Elo", "From", "To"])
for target_club in target_club_name:
    api_url = f"http://api.clubelo.com/{target_club}"
    
    response = requests.get(api_url)
    decoded_content = response.content.decode("utf-8")
    cr = csv.reader(decoded_content.splitlines(), delimiter = ",")
    
    my_list = list(cr)
    
    target_club_elo_rating = pd.DataFrame(my_list[1:], columns = my_list[0])
    
    target_club_all_elo = pd.concat([target_club_all_elo, target_club_elo_rating])
target_club_all_elo = target_club_all_elo[~target_club_all_elo.Club.isna()]
target_club_all_elo = target_club_all_elo.merge(elo_rating[["Club", "team_api_id"]].drop_duplicates(), how = "left", on = "Club")
target_df = df_match_elo[(df_match_elo.home_team_api_id.isin(valid_target_team_id.home_team_api_id)) & (df_match_elo.home_elo.isna())]
target_df_merged = target_df.merge(target_club_all_elo[["team_api_id", "To", "Elo"]], how = "left", left_on = "home_team_api_id", right_on = "team_api_id")
target_df_merged = target_df_merged[target_df_merged.match_date > target_df_merged.To]
imputed_elo = target_df_merged.sort_values(["home_team_api_id", "match_date", "To"]).groupby(["home_team_api_id", "match_api_id"]) \
                              .last().reset_index()[["match_api_id", "match_date", "home_team_api_id", "away_team_api_id", "Elo"]] \
                              .rename(columns = {"Elo": "home_elo"})
df_match_elo = pd.concat([df_match_elo[~df_match_elo.home_elo.isna()], imputed_elo])
  • Now, let’s get away teams’s Elo rating.
df_match_elo = df_match_elo.rename(columns = {"away_team_api_id": "team_api_id"}).merge(elo_rating[["team_api_id", "match_date", "Elo"]], how = "left", on = ["team_api_id", "match_date"]).rename(columns = {"Elo": "away_elo", "team_api_id": "away_team_api_id"})
df_match_elo.isna().sum()
match_api_id          0
match_date            0
home_team_api_id      0
away_team_api_id      0
home_elo              0
away_elo            261
dtype: int64
  • Some away teams don’t have Elo information for some match dates.
  • We will impute this missing values by filling most recent Elo rating of corresponding teams.
target_team_id = df_match_elo[(df_match_elo.away_elo.isna())][["away_team_api_id"]].drop_duplicates()
valid_target_team_id = target_team_id[target_team_id.away_team_api_id.isin(df_team.team_api_id.unique())]
target_club_name = elo_rating[elo_rating.team_api_id.isin(valid_target_team_id.away_team_api_id)].Club.unique()
target_df = df_match_elo[(df_match_elo.away_team_api_id.isin(valid_target_team_id.away_team_api_id)) & (df_match_elo.away_elo.isna())]
target_df_merged = target_df.merge(target_club_all_elo[["team_api_id", "To", "Elo"]], how = "left", left_on = "away_team_api_id", right_on = "team_api_id")
target_df_merged = target_df_merged[target_df_merged.match_date > target_df_merged.To]
imputed_elo = target_df_merged.sort_values(["away_team_api_id", "match_date", "To"]).groupby(["away_team_api_id", "match_api_id"]) \
                              .last().reset_index()[["match_api_id", "match_date", "home_team_api_id", "away_team_api_id", "home_elo", "Elo"]] \
                              .rename(columns = {"Elo": "away_elo"})
df_match_elo = pd.concat([df_match_elo[~df_match_elo.away_elo.isna()], imputed_elo])
df_match_elo.isna().sum()
match_api_id        0
match_date          0
home_team_api_id    0
away_team_api_id    0
home_elo            0
away_elo            0
dtype: int64
  • Now, there is no missing value in the home & away teams’ Elo information.
df_match_elo.dtypes
match_api_id         int64
match_date          object
home_team_api_id     int64
away_team_api_id     int64
home_elo            object
away_elo            object
dtype: object
  • Change the data type of the Elo to the float.
df_match_elo["home_elo"] = df_match_elo.home_elo.astype("float")
df_match_elo["away_elo"] = df_match_elo.away_elo.astype("float")

df_match_elo.dtypes
match_api_id          int64
match_date           object
home_team_api_id      int64
away_team_api_id      int64
home_elo            float64
away_elo            float64
dtype: object

2. EDA of the Elo rating

Q. What is the winning percentage of teams with larger Elo rating than the opponent team?

df_match_elo_with_result = df_match_elo.merge(df_match_basic[["match_api_id", "match_result"]], how = "left", on = "match_api_id")
home_bigger = df_match_elo_with_result[df_match_elo_with_result.home_elo > df_match_elo_with_result.away_elo][["match_api_id", "home_elo", "away_elo", "match_result"]]
home_bigger.loc[home_bigger.match_result == "home_win", "match_result"] = "win"
home_bigger.loc[home_bigger.match_result == "away_win", "match_result"] = "lose"
away_bigger = df_match_elo_with_result[df_match_elo_with_result.home_elo < df_match_elo_with_result.away_elo][["match_api_id", "home_elo", "away_elo", "match_result"]]
away_bigger.loc[away_bigger.match_result == "home_win", "match_result"] = "lose"
away_bigger.loc[away_bigger.match_result == "away_win", "match_result"] = "win"
bigger = pd.concat([home_bigger, away_bigger])
bigger.match_result.value_counts()
win     12596
lose     6518
draw     6507
Name: match_result, dtype: int64
sns.countplot(x = bigger.match_result, order = bigger.match_result.value_counts().index)
<AxesSubplot:xlabel='match_result', ylabel='count'>

png

  • A team with a higher Elo rating has an about 50% chance of winning against a team with a lower Elo rating.
df_match_elo_with_result[df_match_elo_with_result.home_elo < df_match_elo_with_result.away_elo].match_result.value_counts()
away_win    5064
home_win    4221
draw        3492
Name: match_result, dtype: int64
fig, axes = plt.subplots(1, 2, figsize = (12, 5))

sns.countplot(x = df_match_elo_with_result.match_result, ax = axes[0])
axes[0].set_title("All matches", fontsize = 15)

sns.countplot(x = df_match_elo_with_result[df_match_elo_with_result.home_elo < df_match_elo_with_result.away_elo].match_result, ax = axes[1])
axes[1].set_title("If away team's Elo > home team's Elo", fontsize = 15)
Text(0.5, 1.0, "If away team's Elo > home team's Elo")

png

  • If we look at the entire match without taking Elo into account, the away team wins about 30% of the time.
  • However, if we only consider matches where the away team has a higher Elo, we can see that the away team has a higher win rate.
  • That is, it can be seen that Elo has a greater impact on the match result than home or away.

3. Feature engineering for Elo rating.

  • Comparing the relative Elo between the two teams is more important than the absolute Elo value of each team.
  • So let’s use the Elo difference between the two teams as a variable (Home team Elo - Away team Elo) rather than the Elo of the two teams.
df_match_elo["elo_difference"] = df_match_elo.home_elo - df_match_elo.away_elo  
  • In addition, the elo at the time of the match is important, but the mid- to long-term elo can also be an important variable in determining how strong each team is.
  • So, let’s create variables to identify changes in each team’s mid-term and long-term Elo ratings:
    • each team’s last 1 / 3 / 5 / 10 / 20 / 30 / 60 / 90 matches
      • average Elo rating
      • standard deviation of Elo rating
elo_rolling_info = elo_rating[["team_api_id", "match_date", "Elo"]].dropna()
elo_rolling_info = elo_rolling_info.sort_values(["team_api_id", "match_date"])
for i in [1, 3, 5, 10, 20, 30, 60, 90]:
    elo_rolling_info[f"avg_elo_last_{i}_matches"] = elo_rolling_info.sort_values(["team_api_id", "match_date"]) \
                                                                    .groupby("team_api_id")["Elo"].shift(1).rolling(i).mean().values
    elo_rolling_info[f"std_elo_last_{i}_matches"] = elo_rolling_info.sort_values(["team_api_id", "match_date"]) \
                                                                    .groupby("team_api_id")["Elo"].shift(1).rolling(i).std().values
df_match_elo = df_match_elo.merge(elo_rolling_info.drop(["Elo"], axis = 1), how = "left", left_on = ["home_team_api_id", "match_date"], right_on = ["team_api_id", "match_date"])
for i in [1, 3, 5, 10, 20, 30, 60, 90]:
    df_match_elo = df_match_elo.rename(columns = {f"avg_elo_last_{i}_matches": f"home_team_avg_elo_last_{i}_matches",
                                                  f"std_elo_last_{i}_matches": f"home_team_std_elo_last_{i}_matches"})
df_match_elo = df_match_elo.merge(elo_rolling_info.drop(["Elo"], axis = 1), how = "left", left_on = ["away_team_api_id", "match_date"], right_on = ["team_api_id", "match_date"])
for i in [1, 3, 5, 10, 20, 30, 60, 90]:
    df_match_elo = df_match_elo.rename(columns = {f"avg_elo_last_{i}_matches": f"away_team_avg_elo_last_{i}_matches",
                                                  f"std_elo_last_{i}_matches": f"away_team_std_elo_last_{i}_matches"})
df_match_elo = df_match_elo.drop(["match_date", "home_team_api_id", "away_team_api_id", "home_elo", "away_elo", "team_api_id_x", "team_api_id_y"], axis = 1)
df_match_elo
match_api_id elo_difference home_team_avg_elo_last_1_matches home_team_std_elo_last_1_matches home_team_avg_elo_last_3_matches home_team_std_elo_last_3_matches home_team_avg_elo_last_5_matches home_team_std_elo_last_5_matches home_team_avg_elo_last_10_matches home_team_std_elo_last_10_matches home_team_avg_elo_last_20_matches home_team_std_elo_last_20_matches home_team_avg_elo_last_30_matches home_team_std_elo_last_30_matches home_team_avg_elo_last_60_matches home_team_std_elo_last_60_matches home_team_avg_elo_last_90_matches home_team_std_elo_last_90_matches away_team_avg_elo_last_1_matches away_team_std_elo_last_1_matches away_team_avg_elo_last_3_matches away_team_std_elo_last_3_matches away_team_avg_elo_last_5_matches away_team_std_elo_last_5_matches away_team_avg_elo_last_10_matches away_team_std_elo_last_10_matches away_team_avg_elo_last_20_matches away_team_std_elo_last_20_matches away_team_avg_elo_last_30_matches away_team_std_elo_last_30_matches away_team_avg_elo_last_60_matches away_team_std_elo_last_60_matches away_team_avg_elo_last_90_matches away_team_std_elo_last_90_matches
0 492473 -18.743896 1443.587769 NaN 1442.109660 1.324674 1441.677710 1.107800 1444.254285 4.062424 NaN NaN NaN NaN NaN NaN NaN NaN 1462.331665 NaN 1460.853556 1.324674 1460.421606 1.107800 1462.998193 4.062455 NaN NaN NaN NaN NaN NaN NaN NaN
1 492474 9.583740 1383.398315 NaN 1382.943888 0.393545 1382.853003 0.304839 1386.924109 4.970817 NaN NaN NaN NaN NaN NaN NaN NaN 1373.814697 NaN 1373.360270 0.393545 1373.269385 0.304839 1377.340466 4.970773 NaN NaN NaN NaN NaN NaN NaN NaN
2 492475 -131.862183 1467.186279 NaN 1466.731852 0.393545 1466.640967 0.304839 1470.712048 4.970773 NaN NaN NaN NaN NaN NaN NaN NaN 1599.048462 NaN 1598.594035 0.393545 1598.503149 0.304839 1610.064929 14.062292 NaN NaN NaN NaN NaN NaN NaN NaN
3 492476 99.138428 1471.177368 NaN 1467.407023 3.282954 1466.516626 2.622099 1468.749377 4.144453 NaN NaN NaN NaN NaN NaN NaN NaN 1372.038940 NaN 1370.560832 1.324674 1370.128882 1.107800 1372.705457 4.062424 NaN NaN NaN NaN NaN NaN NaN NaN
4 492478 -177.345459 1331.437012 NaN 1326.942546 7.784644 1322.007324 9.031994 1318.257568 7.304012 1329.887677 13.982247 1336.534269 15.118325 NaN NaN NaN NaN 1508.782471 NaN 1502.902832 5.091916 1499.274072 7.513642 1496.358435 6.254508 1496.439099 4.995854 1498.021908 5.532603 NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
25624 506683 143.998901 1343.611572 NaN 1344.436035 1.428012 1345.095605 1.354731 1345.380432 1.138978 1346.061859 1.161544 1346.475037 1.112274 1346.788143 1.139500 1341.886538 7.693840 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
25625 506697 68.466553 1268.079224 NaN 1268.079224 0.000000 1267.240454 1.148532 1262.945789 6.402012 1257.423682 8.081155 1253.740869 8.425476 1250.267346 6.867502 1249.442734 6.745897 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
25626 506712 315.905884 1515.518555 NaN 1514.165568 2.343442 1513.083179 2.223184 1512.841760 1.656324 1518.017987 6.141389 1519.505819 5.653916 1523.068176 5.387709 1521.553478 7.638267 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
25627 506730 301.601074 1501.213745 NaN 1502.005208 1.370855 1502.638379 1.300507 1503.113257 1.001131 1500.035645 3.907609 1499.129964 5.309604 1504.506095 6.778629 1506.834241 6.466452 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
25628 506752 78.508667 1278.121338 NaN 1278.121338 0.000000 1274.224854 5.335481 1272.146313 4.302022 1269.433453 4.856444 1271.037667 4.883669 1281.986029 12.858604 1290.397424 15.896633 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

25629 rows × 34 columns

  • Save the table.
df_match_elo.to_csv("../data/df_match_elo.csv", index = False)

4. Summary

  • Create the elo difference variable (Home team Elo - Away team Elo)
  • Create variables to identify changes in each team’s mid-term and long-term Elo ratings:
    • each team’s last 1 / 3 / 5 / 10 / 20 / 30 / 60 / 90 matches
      • average Elo rating
      • standard deviation of Elo rating

4.summary