From this dataframe, I want to calculate different stats at the team level
data = [['20-10-2020', 'PSG', 'Man U', 1, 2], ['20-10-2020', 'Leipzig','Istanbul',2,0], ['27-10-2020', 'Istanbul','PSG',0,2], ['27-10-2020', 'Man U','Leipzig',5,0]]
df = pd.DataFrame(data, columns = ['Date', 'Home', 'Away', 'HG', 'AG'])
print(df)
Date Home Away HG AG
0 20-10-2020 PSG Man U 1 2
1 20-10-2020 Leipzig Istanbul 2 0
2 27-10-2020 Istanbul PSG 0 2
3 27-10-2020 Man U Leipzig 5 0
For instance, for each team, I calculate the number of points and goals from the previous game. The naive implementation creates two dataframes, one for the home teams and one for the away teams and concatenate them. I tried using melt
but I did not find the syntax to achieve the dataframe I wanted.
df_home = df.reset_index(level=0)
columns = {
"Date": 'date',
"Home": "team",
"Away": "opponent",
'HG': 'team_goals',
'AG': 'opponent_goals',
}
df_home = df_home.rename(columns=columns)
df_home['site'] = 'H'
df_away = df.reset_index(level=0)
columns = {
"Date": 'date',
"Home": "opponent",
"Away": "team",
'HG': 'opponent_goals',
'AG': 'team_goals',
}
df_away = df_away.rename(columns=columns)
df_away['site'] = 'A'
df_team = pd.concat([df_home, df_away], ignore_index=True).sort_values(['date'])
df_team['team'] = df_team['team'].astype('category')
df_team['opponent'] = df_team['opponent'].astype('category')
print(df_team)
index date team opponent team_goals opponent_goals site
0 0 20-10-2020 PSG Man U 1 2 H
1 1 20-10-2020 Leipzig Istanbul 2 0 H
4 0 20-10-2020 Man U PSG 2 1 A
5 1 20-10-2020 Istanbul Leipzig 0 2 A
2 2 27-10-2020 Istanbul PSG 0 2 H
3 3 27-10-2020 Man U Leipzig 5 0 H
6 2 27-10-2020 PSG Istanbul 2 0 A
7 3 27-10-2020 Leipzig Man U 0 5 A
With this dataframe I can calculate the stats based on the team
column
conditions = [df_team['team_goals'] > df_team['opponent_goals'], df_team['team_goals'] == df_team['opponent_goals']]
choices = [3, 1]
df_team['pts'] = np.select(conditions, choices, default=0)
f = lambda x: x.shift(1).rolling(1).sum()
df_team['form_l1_before'] = df_team.groupby(['team'])['pts'].apply(f)
df_team['goal_l1_before'] = df_team.groupby(['team'])['team_goals'].apply(f)
print(df_team)
index date team opponent team_goals opponent_goals site
0 0 20-10-2020 PSG Man U 1 2 H
1 1 20-10-2020 Leipzig Istanbul 2 0 H
4 0 20-10-2020 Man U PSG 2 1 A
5 1 20-10-2020 Istanbul Leipzig 0 2 A
2 2 27-10-2020 Istanbul PSG 0 2 H
3 3 27-10-2020 Man U Leipzig 5 0 H
6 2 27-10-2020 PSG Istanbul 2 0 A
7 3 27-10-2020 Leipzig Man U 0 5 A
pts form_l1_before goal_l1_before
0 0 NaN NaN
1 3 NaN NaN
4 3 NaN NaN
5 0 NaN NaN
2 0 0.0 0.0
3 3 3.0 2.0
6 3 0.0 1.0
7 0 3.0 2.0
The problem is that I want to convert that dataframe back with one row per game (identified by the index
column) and each stats has its own columns
# Ex second game for Istanbul and PSG with stats from the previous game
expected_data = [['27-10-2020', 'Istanbul','PSG',0,2,0,0,0,1]]
df_target = pd.DataFrame(expected_data, columns = ['date', 'Home', 'Away', 'HG', 'AG', 'Home_form_l1_before', 'Home_goal_l1_before', 'Away_form_l1_before', 'Away_goal_l1_before'])
print(df_target)
date Home Away HG AG Home_form_l1_before
0 27-10-2020 Istanbul PSG 0 2 0
Home_goal_l1_before Away_form_l1_before Away_goal_l1_before
0 0 0 1