Data analysis has changed sports forever.
Sandy Alderson, the first baseball general manager credited with taking a quantitative approach to baseball, focused on principles known as “sabermetrics”. Sabermetrics in baseball say a large part of a player’s value is based on his ability to score more runs than the other team.
As any serious fantasy sports player or sports gambler will tell you that data analytics is not only for general managers and their number crunchers these days. Thousands of websites are available to help anyone with a spreadsheet figure out the best ways to beat their buddies or the oddsmakers.
There are huge opportunities for data analysis in soccer, and we may just be starting to scratch the surface. Arguably, soccer is still looking for the breakthrough statistics like WHIP and OBP made in baseball. One of the newer experiments is Expected Goals modeling, known to the geeks as xG. xG calculates based on tens of thousands of shots before it, and registers the probability a particular shot will find the back of the net. Note that xG is not a definitive metric; it is a derived metric based on whichever model is chosen to be used in the calculation. Given the fact multiple websites have their own individual xG model, it is clear soccer analysis has not yet arrived.
Even more clear is that no metric is even close to xG in evaluating those whose job on the field is not specifically trying to score goals. Keepers now have their own Expected Goals Against model. Midfielders have xG Chain, xPass, xBuildup Goal Chain and others, but none of these are as fleshed out yet as xG. So given xG is an insufficient statistic for evaluating forwards, we are nowhere close to have a fully robust set of models for evaluating all key aspects of player performance.
Unfortunately, if you don’t have a (probably very expensive) license to the official Opta MLS source data, it can be difficult to gain access to the information needed for high quality analysis. Many MLS clubs have not even hired dedicated data analysts yet. Given this, where is the breakthrough analysis going to come from if unpaid, part-time analysts don’t have access to the richer data set? xG can’t even be properly calculated without shot and player/keeper positional x/y coordinate data which is only available with the Opta data set.
So what does this mean to Quakes Epicenter? In the past couple months, our existing and new writers have been having conversations about how we can bring high-quality data analysis and insight to fans of the San Jose Earthquakes. We believe that many questions about the on-field performance of the Quakes can be better answered by the public data available to us. We are even looking at enriching this data in unique ways. Answering these questions to the best of our ability will hopefully lead to more meaningful questions and answers. Not every article will go in-depth with data analytics, but you will see more of this in specific types of content we intend to provide.
While we can easily surmise the marriage of video analysis and data analysis is likely where the future is at – and Jesse Fioranelli has already been getting the Earthquakes into this area with the Second Spectrum partnership – it is still nascent and expensive. In the meantime, here is the approach we are taking to our data analysis:
- Build a repository of data on the Quakes at the team level and at the player level based on publicly available sources (at the moment, we have 2017 and 2018 data compiled from four sources).
- Ask questions based on the data we have. See if we can figure out the answers.
- Go deeper with cool charts, pivots and comparisons. See if what we find is interesting.
- If it is interesting, we will write about it for Quakes Epicenter.
Perhaps you saw the Orlando City preview article from one of our new writers, Asher Kohn, titled “Quakes go cross-country to prep for shootout”. In it, Asher compared selected Earthquakes central midfielders, using 2017 and 2018 MLS league data we have summarized from WhoScored.com and AmericanSoccerAnalysis.com (ASA). Here are the tables from Asher’s article again:
Note: If you are not sure what some these statistics mean, check out the handy glossary below!
One use of our data repository is at a feature article level — such as Asher’s — but future data analysis articles could go much deeper into exactly what data like this is telling us. We hope you’ll look forward to seeing more of this type of unique insight from Quakes Epicenter in the coming weeks.
There are many terms which have general meanings in a soccer match but have specific meaning within soccer statistics, such as “cross” and “touch”. We will be using some acronyms and the statistical versions of these match terms in these articles, so here is a glossary of many of them:
- GF – Goals For. We know you know what this means.
- GA – Goals Against. We know you know what this means, too.
- xG – Expected Goals. Based on a number of factors such as distance from goal and angle of shot, the percent chance that a goal is likely to be scored based on historical data. xG can be represented as a single shot or a sum of multiple shots usually represented at a game or season total level. There are many xG models, and they are being “improved” all the time. The freely-available one from MLS Opta data we have access to is from the ASA. ASA also have an xG model for goalkeepers for evaluating expected goals against.
- xGp – Expected Goals by a particular player.
- xGt – Expected Goals by a team. This value is not the same as the sum of a team’s xGp in a match due to reasons provided here. xGtH is for home teams, xGtA is for away teams.
- xA – Expected Assists. Awarded based on the xGp of a key pass. This is the ASA definition used with MLS Opta data. Other definitions exist but unfortunately data using these definitions is not available to the MLS-watching general public.
- xG Chain – Expected Goals Chain. The xG value is awarded to each player involved in the completed passes which lead to a shot, including the shooter and pass to the shooter.
- xBuildup GC – Expected Buildup Goals Chain. Similar to xG Chain, but does not award xG value to the player shooting or making the final pass (assist or potential assist) to the shooter.
- xPass – Expected Pass. Usually used to compare actual to expected data.
- SOT – Shot on Target. This is not awarded when a shot hits the crossbar or post. Also called “Shot on Goal”.
- Duels (Won or Lost) – When opposing players challenge for a ball and one player gains possession and the other does not.
- Aerial Duels / Aerials (Won or Lost) – A subset of duels won or lost by a headed ball or (goalkeeper only) handled ball.
- Key Pass – A pass which is followed by a shot, a shot on target or a goal.
- Big Chance (Created or Missed) – A situation where a player should reasonably be expected to score usually in a one-on-one scenario or from very close range. Big Chance Created is given to the potential assist which generates a Big Chance. A Big Chance Missed happens when the targeted player does not score when a Big Chance is created.
- Touch – The initial contact of a player on the ball after it was contacted by another player or from a dead-ball situation.
- Tackle – Dispossessing an opponent, whether the tackling player comes away with the ball or not.
- Dribble (Successful or Unsuccessful) – An attempt by a player to get away from an opposing player while maintaining possession of the ball. Also called a “Take-on”.
- Dispossessed – A tackle by an opponent without the attacker attempting to dribble past them.
- Turnover – Loss of possession due to a mistake or poor control, often out-of-bounds to a throw-in, goal kick or corner kick. Also called a “Poor Touch”.
- Interception – A stolen attempted pass.
- Clearance – A ball which was never possessed and was kicked defensively away from the player’s goal.
- Long Ball – An attempted/completed longer pass, usually further up the pitch or across the pitch but is not a cross. According to Opta’s definition, this is a pass of 35 yards or more.
- Thru Ball / Through Ball – An attempted/completed pass sent behind the last outfield defender.
- Cross – An attempted/completed pass to the central area of the penalty box, usually from the left or right of the penalty box.
- Block – Using the body to defend against an attempted shot, pass or cross.
We hope you will look forward to our future content in this area.