Introduction to advanced statistics and analysis, part 2

0


Last week, in Part 1 of this Primer on Advanced Statistics and Analysis, we looked at some statistics and resources that cover the foundations of what we may call “advanced statistics” or “analysis” in hockey. To recap:

  • Corsi (CF%) measures the number of attempted shots (shots, misses, blocked shots) that there are when a player is on the ice
  • Production (points, goals, assists) can be measured in different ways (primary points, 5v5 points, etc.)
  • Expected Goals (xGF%) is similar to Corsi, but it weighs shot attempts based on their quality, depending on where the shot was taken
  • Attempt shot patterns like Corsi and Expected Goals correlate quite well with future wins, which is why they are important.

In this article, I’m going to step away from that baseline a bit and talk about the many other stats available for the NHL. It is therefore important that you are comfortable with everything in this summary before proceeding to the next step.

The beauty of publicly available data is that it takes a ton of grassroots effort to get there. Please consider donating to the creators of these tools so that you can support them and have access to them as well. The Patreon links for creators will also be in their specific sections, if you want to decide later, but here they are in advance to help you:

Advertising – Continue Reading Below

The majority of the stats we discussed in Part 1 are taken from the game logs published by the NHL. You could develop an advanced statistics model without ever having watched a hockey game, if you wanted to, using these numbers. It is something different.

Manual tracking involves watching a game and keeping an account of certain events that occur for each player. This can cover a number of things, but the two main purposes are to carry the puck in or out of the zone and to follow the passes, especially those that lead to shooting attempts. Ryan Stimson was a major player in the field of passing statistics, and although his data is no longer publicly available, you can read his book which covers this topic and much more. Or, you can jump into the comeback machine and watch some of Ryan’s old blog posts on Hockey Graphs. The first is this one which shows how assists (passes that lead to shot attempts) are a good predictive measure. Then you can see how you might use that data, focusing on our Toronto Maple Leafs from yesteryear:

The other major player in this area is Corey Sznajder, who made his debut with zone entry and exit monitoring. It now also covers passing statistics. Basically Corey watches every NHL game, tracks everything there, and then puts it into some nice, readable spreadsheets that we can use. We can look at the raw data itself, which is just a list of all the events that are happening, but there are also much easier ways to interact with it. All this is now accessible from its launched website yesterday, AllThreeZones.com.

For example, with the matches followed for the 2020-21 season, this table visualization Auston matthews and Mitch marner are outliers in terms of the number of chances they take and create, respectively, while Guillaume Nylander is in the upper levels of both. I’m going to give you three guesses as to who this Edmonton Oiler is hanging out alone in elite territory, and the first two guesses don’t count.

Advertising – Continue Reading Below

Likewise, this table visualization shows me that Guillaume Nylander and Alexandre kerfoot were the best on the team to enter the opposing zone without creating turnovers:

Advertising – Continue Reading Below

There is a wealth of data for the 2020-21 season and historical seasons up to 2016, available at This site. There are, however, two major caveats with this data to be aware of:

  1. There are prejudices at play here. Trackers are as unbiased as they can get, but it would be physically impossible to be completely unbiased.
  2. We have yet to prove the predictive value of statistics like this. We know that zone entries lead to the creation of shooting attempts, as demonstrated by Charlie O’Connor. We also know that winning the battle of shooting attempts leads to victories, from last week’s post. But zone entries / exits are only one component of what goes into successful shooting attempts, mainly on side shots (there is limited impact of zone entries on preventing attempted shooting) .

For these reasons, this data should only be used in certain circumstances where it can complement a nuanced and in-depth analysis.

“Viz” is a colloquial term for data visualizations, and we find it quite often on hockey stats forums, including the ones I’ve shared above. One of the main contributors to these visualizations is Micah Blake McCurdy, who uses his site hockeyviz.com to host its tools. Good things are behind a paywall, so check out Micah’s Patreon to contribute and access.

Advertising – Continue Reading Below

The visualization you can find on Micah’s website understand:

  • Match Simulator, a fun tool to pit user-created teams against each other to see who has the best chance of winning. Example using the 2018-19 teams for Toronto and Montreal:

  • A PP shot locator tool to show you where players are making their shot attempts (in this case not counting blocked shots which is typical for watching powerplays as blocks can be extremely detrimental to the overall powerplay setup ). Examples of Matthews and Barrie charts from 2019-2020:

  • An “environmental still” where you can isolate the impacts of certain actors with each other (commonly referred to as With You Without You, or WOWY). Example of offensive appearance with Matthews and Marner versus appearance with Matthews and Nylander:

  • An individual shot map generator, showing you where certain players are taking their shots. Examples with Marner and Nylander from 2016 to 2020:

There are even more tools than the ones I’ve shown here, but these are the main ones. You can also see how particular matches have gone, how particular teams are faring, and more. I wanted to update them with 2020-21 data but my site subscription needs to be updated so unfortunately you are stuck with outdated visualizations.

These visualizations are very useful because they take data and put it in a medium that is easier for many people to understand. I’m a visual learner and often learn new advanced stat concepts from Viz like these. Hope you can do the same.

This is where we get into the heavy math stuff, so feel free to skip this part if you don’t want to hear about it. The “executive summary” for this section is this: Math nerds take all the data we have, put it in a big data analysis box, and it spits out a number that shows how some gamers are doing. The data analysis box is called “regression” and the whole process is called “model”.

The things we are going to talk about come from evolution-hockey.com, a site run by a pair of twin brothers who love the Minnesota Wild (@EvolvingWild on Twitter). These twins are our math nerds and have developed two models for the NHL.

The first model is the regularized adjusted plus-minus (RAPM). This model is inspired by a similar model developed for the NBA. In the NBA the model attempts to predict points scored per 100 possessions, but for hockey we have learned over time that points are not a reliable predictor of future points (sources on this have dried up, it has become more community know-how to this). The RAPM process can be used to predict any variable you want. In this case, the model tries to predict the future Corsi, knowing that Corsi predicts future goals and that goals predict future victories. For example, a target variable that the RAPM model can be used to predict is Corsi For Per 60 Minutes.

In order to predict this variable, we put a number of different things in the “box”:

After that, the twins went further and developed a second model. Since the RAPM can predict the future Corsi and we know that the future Corsi predicts future goals, we can create another model to show which players are likely to produce future goals. This gives us the second pattern, Superior Objectives Over Replacement (GAR). This model aims to give a number of goals that a player could help create, after taking into account all kinds of factors such as the shooting attempts they make, who they are playing with and other factors. The number of goals is expressed in positive or negative, relative to a player of “replacement level”. Colloquially, this is your average NHL appeal that can contribute to the NHL level but is not to be trusted. Finally, you can use this GAR model to predict how many wins over substitution (WAR) a player will contribute, similarly to the WAR models in baseball.

When it spits out the results, this is usually how to interpret them for 82 datasets:

This article covers many different topics from a few different resources, and I understand that it can be overwhelming. Feel free to dive into one of these three resources, take the time to familiarize yourself with it, then move on to the next. You can always come back to this post if you need help with framing, or just want links to the different sites.

These are the main “extras” that I will use in future Staturday columns to tell stories about players and teams. I’ll always include links to the two introductory articles at the bottom if you want to come back here and remember how to get to something or what the definition of something is.

At the end of the day, data analytics in hockey is such a dynamic field that at any point in time one of those people could be hired by a team and all of their stuff could go missing, so we have to use what we have. while we have it. And new things will keep popping up as people try to innovate in this space, and if something cool happens, I’ll be sure to write a column about it to show you at all.

That’s pretty much the case for this week. Next week will be the third and final part of this introductory series where we break down the stats resources for women’s leagues like the PWHPA and PHF (formerly NWHL) teams.



Share.

Comments are closed.