PyCon 2016 in Portland, Or
hills next to breadcrumb illustration

Tuesday 2:35 p.m.–3:05 p.m.

When is it good to be bad? Web scraping and data analysis of NHL penalties

Wendy Grus

Audience level:


On Jan. 20, Philadelphia Flyers forward Zac Rinaldo was ejected from a game after boarding Penguins defenseman Kris Letang. The Flyers came back to win. After the game, Rinaldo said he "changed the game" (for which he was suspended 8 games). Using Python for webscraping and data analysis, I explore data from 10 NHL seasons to investigate how hockey penalties affect the outcome of the game.


In hockey, most penalties cause your team to have to play down a player, giving a decisive advantage to your opponents. Here I explore whether there is ever a case when penalties are an advantage. (Unfortunately, I won't be talking about hockey fights, as those usually result in a player lost for both teams - you can't fight yourself!) This talk will focus primarily on the process of getting the penalty, player, team, and game data and less on data analysis. At the end, I will describe regression models that describe how getting a penalty affects the outcome of the game for your team, ie "When is it good to be bad?" I explored NHL penalty data from the past 8 complete seasons. I built parsers to get the same team, player, and penalty information from each game from the play-by-play recaps. I also got player stats from the same seasons so that player strength could be included in the models. I will discuss some examples of data joining and data cleaning. For the models, I will look at three outcomes: (1) positive change, where the state of the game for the team who got the penalty improves, (2) next goal, where the team who got the penalty scores the next goal, and (3) score differential, the change in score differential from the time of the penalty to the end of regulation.