Photo credit: ISI Photos. Jeremy Ebobisse celebrates one of his 17 goals in 2022 after scoring on a penalty kick against FC Dallas on September 17, 2022.
Special acknowledgment to Quakes Epicenter patron Trevor Wojcik, a data scientist in the economics field, for constructing a season simulation technique for me that will be used in this series and beyond. Trevor’s insightful work and feedback made this preseason series concept for 2023 possible.
Introduction
There are very few positives a San Jose Earthquakes fan can take from the 2022 MLS season. Not only was the team winless under Matias Almeyda until his dismissal, but they narrowly avoided the Wooden Spoon and finished with only a solitary road win. A promising US Open Cup run was halted by a bitter loss to Northern California rivals Sacramento Republic. The decision to name Luchi Gonzalez the next head coach while the season was progressing (and while Gonzalez was a USMNT assistant coach preparing for the World Cup) was baffling to many, including, reportedly, many players.
The few positives included a 17-goal season from a healthy Jeremy Ebobisse, seven goals and 14 primary and secondary assists from Cristian Espinoza, and the further emergence of young, potential stars-in-the-making like winger Benji Kikanovic (who is rumored to be transferring to AEK Athens as I write this), goalkeeper JT Marcinkowski, and, unexpectedly, outside back Paul Marie. Designated Player acquisition Jamiro Monteiro was all over the pitch on both sides of the ball. Unfortunately, many other regular starters struggled to make a consistent impact and a few even seemed to regress despite being in their prime years.
In the 2022 preseason, I said that the Quakes couldn’t concede more than 47 goals (1.38 per game) and make the playoffs under Almeyda given his history in both MLS and Liga MX. They proceeded to give up 15 goals in seven games (2.14 per game). They finished the season with 69 goals against (2.0 per game under interim head coach Alex Covelo) and a -17 goal differential.
However, they did score 52 goals, exceeding my expectations. And 52 goals minus 47 against would have been a +5 goal differential. Such an output would have almost certainly put them into the playoffs.
A league unlike any other
MLS is known for its chaos. American Soccer Analysis recently analyzed why MLS is more chaotic than, say, the European Big Five leagues. Let me borrow this chart from this recent fantastic article about Europe, Money, and the Problem with Disparity from my good friends Eliot McKinley, Sean Steffen, and Tiotal Football.
There are a lot of fancy words in there on the x- and y-axis descriptions, plus a scatterplot of points, that demonstrate Major League Soccer is not at all predictable like the “dynasty” and “haves and have-nots” leagues. If you are among those pointing to Matias Almeyda having AEK Athens in second place in the Greek Super League, the reason is not because of great coaching, but because he’s now in a 14-team league with a huge talent disparity — and he now has the top-tier talent.
In MLS, we have something known as “parity”. Talent and money mean something but, in the end, not nearly as much as most leagues. While a “Moneyball” team hasn’t yet won an MLS Cup, Philadelphia won a Supporters Shield in 2020, Colorado won the Western Conference in 2021, and then Philadelphia came a Gareth Bale header in the 128th minute from winning MLS Cup in 2022. It’s going to happen very soon.
Here in MLS, Almedya added his personal brand of man-marking and restless defense chaos on top of the inherent league chaos, and the results were a confusing mess of insane highs but even lower lows as the team struggled to protect leads even at home. With San Jose, he did not have the top-of-the-league talent that was needed to overcome the deficiencies of his defensive tactics like he did with Chivas Guadalajara when winning Liga MX and Concacaf Champions League.
In 37 home games since 2020, Almeyda had the second lowest points-per-game (PPG) of any MLS head coach with more than 20 home games (the lowest was Jaap Stam at FC Cincinnati who presided over two Wooden Spoons in 2020 and 2021). Almeyda’s 1.18 home PPG in 2021 was tied for the lowest of any coach who kept their job all 34 games since DC United’s Ben Olsen in 2013 (who won a US Open Cup at least).
Alex Covelo, wearing an interim label and inheriting only 3 points from 7 games, was able to restore respectability to the Quakes playing at PayPal Park, if not yet fear in their opponents. He posted the fourth-best home PPG (1.84) of any coach that didn’t coach a full season since 2013. The Quakes won one more game at home than fifth-place Nashville did. Unfortunately, Covelo’s Quakes only won one game on the road. Almeyda won five games on the road in 2021 which would have been good enough for the playoffs had he won at least eight home games like the playoff teams did.
It’s preseason…again
In the immortal words of Bill Murray, “it’s Groundhog Day…again.” We’re less than a couple of months from the start of another MLS season, and I feel like I’m out here needing to say the same things again.
But this time, rather than just picking a goals-against number that feels like it would work along with the Quakes’ decent ability to score, I enlisted some help from Trevor Wojcik, a data scientist, to build an MLS season simulator to see if we could bring order to the chaos — a path forward for these Quakes that feel like they have been wandering in the MLS wilderness since 2012.
Even their two playoff appearances were in complete outlier seasons with -17 (2020 had eight playoff spots) and -21 (2017 had 11 teams in the West) goal differentials. The Quakes had a -17 goal differential in 2022.
At the severe risk of repeating myself, one of the biggest setbacks the Quakes have had is calling 2017 and 2020 “successes”. Those seasons were both fool’s gold. No two MLS teams since 2013 have gotten a better table position with less productivity as the following chart proves.
The Quakes have hired some smart people. I know that they know this has to change. I’m no longer preaching to the Front Office about data analytics. They’re making the investment.
What I am here to do is to inject some severe reality into the narratives (like the above) and then identify what needs to change, so that we can collectively track the progress toward what “good looks like” and then what “greatness looks like”. What we want to look for are repeatable patterns that have a higher likelihood of future success over one-game chance-based outcomes that lead to “fool’s gold” conclusions.
Enter Trevor Wojcik’s season simulator. We are actually going to Groundhog Day the 2022 season here. We are going to relive it ten thousand (10,000) times and see how often we get different results.
If you’d like to read the details about how Trevor created the season simulator itself and how it worked with teams other than the Quakes, keep reading to the end of the article. If advanced math and data science make you dizzy, then we’ll warn you when you can stop reading.
First, let’s take a look at the 2022 Earthquakes coaches.
What might have happened if Matias Almeyda continued the season and finished out his contract? Clearly, that was the original plan. Rather than count the first seven results against him, we’ll even let him play them over again, but we are going to count his 2022 results into the likely outcomes for our simulator. Since he only had seven games though, to be fair, we are going to put his 2021 season into the simulator as well. Let’s play 10,000 Almeyda 2022 seasons.
The gray bar shows the 35 points where the Quakes finished in 2022. The green bar shows the playoff line at Real Salt Lake’s 47 points. The simulator gave Almeyda a 3% chance to make the playoffs. THREE FRIGGIN’ PERCENT. That’s with 83% of his data points coming from 2021. In other words, we knew this before the 2022 season. Or at least with a basic season simulator like ours, we could have.
In one of the 10,000 alternate universes, the Quakes got 62 points, Chofis scored 15 goals and was re-signed for 2023 as a $2M DP, the Quakes got second in the West, and Almeyda was also re-signed to great fanfare after the Quakes first playoff win in 10 years. They got it done at home to boot. (That’s all fiction within our fiction as our simulator doesn’t actually tell us that, but it’s fun to imagine it.) But such an outcome would be fool’s gold for 2023. That is how you get another 2018 following the -17 GD playoff results of 2017. We can now predict future failure instead of future success following such an outlier season. In addition to this one 62-point season, Almeyda also got 1,435 Wooden Spoons in our simulations. Which one is more likely to happen in reality?
Granted, Almeyda had a 44% chance of beating the 2022 Quakes 35 points and getting the Quakes out of the gosh darn cellar in the West, but that’s not good enough. Nowhere close to good enough.
Now, what if the Quakes had Alex Covelo for a full season? Maybe even a preseason to get the Almeyda-isms out of the players’ heads? Well, we’ll never know how much the existing man-marking mindset affected Covelo’s ability to get results with the 2022 squad. However, with his penchant for home results, even a few results on the road could get the team into the playoffs, right? RIGHT? (Insert your own Anakin and Padme four-panel meme here.)
Not quite. While a full season of Covelo had an almost 10% chance of making the playoffs, and a 71% chance of not finishing dead last in the West, he also won the Wooden Spoon 289 times, while getting second in the West only 42 times. Just a higher chance of fool’s gold when it comes down to it. Maybe with a clean slate and a preseason to implement his defensive principles, he gets a better set of results. Maybe.
Improving the outcomes
Fortunately, by having a simulator, we can adjust its parameters to fake better outcomes. We can then determine what improvements are needed to make the playoffs on a more consistent basis. What we need to do is make that green bar at least appear in the center or, better yet, on the left side of our graph.
We’re gonna get a little geeky here, and then pull ourselves back out of it and explain more at the end for those who are curious.
Our simulator looks at four things:
- Home/Away Goals For and Goals Against
- Home/Away xG For and xG Against
- Home/Away Standard Deviation on Goals For/Against
- Home/Away Standard Deviation on xG For/Against
It was important that Trevor and I blend reality with probability, as too much of one or the other led to suspicious results when played out across the entire league. Standard deviation gives us a simple way, understood by anyone who has taken a basic statistics class, to see the level of the chaos of teams, home/away dynamics, and outcomes in both reality and probability. In effect, it is our “chaos meter” and gives us a “level of chaos” values for each team, coach, and home/away performance.
In short, what we found was, although Covelo’s chaos values were lower than Almeyda’s at home, they were slightly higher on the road. However, neither Almeyda nor Covelo had higher chaos values than the league average. Uh oh.
That means this simulator isn’t suggesting we just tame the Cardiac Quakes and reduce the chaos. It’s telling us that, despite some better talent, the 2022 Quakes just weren’t a very good team. They were better under Covelo, but, as we’ve shown, only about six points better by improving at home. As the real-world 2022 season told us, Covelo’s results were worse than Almeyda’s on the road.
The results were consistent enough within the MLS league bands, just consistently bad. By taming the chaos (lowering the standard deviation), the team didn’t improve — it actually got slightly worse. The assumption Trevor and I started with was that reducing the chaos would improve things. In fact, the simulator proved definitively that inconsistency wasn’t the issue.
So what about improving the defense? Surely if this team improves on the defensive side of the ball, that’s what is needed. After all, the team was tied for eighth in the league in goals for, so the attack shouldn’t be the issue.
The league average in 2022 was 50 goals against. You will recall the Quakes gave up 69. If we can improve our defense by 30% that should be good enough to expect to make the playoffs. We’ll run 10,000 seasons independent of the coach in order to test this out.
That looks much closer to what we want. With a 30% defensive improvement built in, the Quakes now beat their 35 points in 2022 99% of the time. They make the playoffs 74% of the time. If you gave Quakes fans a 74% chance of making the playoffs before the season started, they’d take it every single time. That’s basically playoffs three out of every four seasons.
Improving 100% in one season? Nearly impossible. But improving 30% year-over-year for three seasons wins you the Western Conference, probably a Shield, and a very good chance of winning MLS Cup in one of those playoff runs.
Problem solved, right? Not so fast. We need to discuss how to improve 30% defensively while not losing any of the attacking ability. If you want to fix the defense, you can play Dom-ball, but the attack will lose its potency.
The Quakes need to fix the defense while also preserving, and ideally improving, the attack. That’s an extremely tall order. This will become the focus of parts two and three of this series.
Enter the Luchi Zone
Luchi Gonzalez does get a clean slate. He said all the right things in his opening press conference and phone calls with the Bay Area sports media. He has had the chance to examine first-hand Gregg Berhalter’s defensive principles and figure out how to incorporate some of them into his model. It’s critical that he figures out what changes in both personnel and tactics will make playoffs a likely outcome rather than an exceptional outcome for the next few years.
His teams in Dallas were neither defensively-minded nor teams that won regularly on the road — two big albatrosses that our simulator proved are the primary Quakes issues. Those albatrosses are not news; we’ve discussed them frequently on The Aftershock. Plus winning on the road in MLS is hard. But now we have definitive proof of their impacts and the amount of improvement needed to consistently make the playoffs.
In part two of this series, we will look at player execution post-Almeyda that doomed the Quakes in 2022, discuss how to fix those situations, and see if a better set of tactics in 2023 with the same talent level is enough to get the team into the playoffs.
Rather than taming the chaos of the last few seasons of the Quakes, this series will seek to bring order within the chaos of MLS for optimum results.
If you have read enough about simulating MLS seasons, stop reading now. If this concept intrigues you, keep right on reading.
About our Season Simulator
All the data used by our season simulator is available publicly, through sites such as americansocceranalysis.com and fbref.com. It uses both actual scores and expected goals from completed games. We have used the outcomes of the 2022 MLS season for our simulator, but we could just as easily take several seasons if the team dynamics didn’t change season-over-season as they do.
The code for our simulator was developed as an R script and could be put into a Shiny web app in the future once we have developed it to the extent we would like. While a web app is not the goal, it would be more transparent and allow further exploration by other individuals. Future enhancements could simulate an in-progress season using results from the current season and a previous season until enough data in the current season is available for realistic predictions. We are not aware of a publicly available app that already does what our simulator is currently capable of doing, although it is likely one exists for one or more Big Five leagues.
We could also use the same code in other leagues. The level of predictability and chaos of a particular league is built into the goals, expected goals, and standard deviation of a team’s performances.
To validate the outcomes, we compared results to recent MLS seasons to ensure the points earned by both Supporters Shield and Wooden Spoons winners were within the band of typical error margins. We also compared individual teams by goals for, goals against, goal differential, and points to ensure the outcomes felt realistic based on what we know from the teams in 2022.
The full league season simulator is not intelligent enough to distinguish season-impacting changes to coaches or mid-season signings and may never be. It can only say how much improvement is needed. We had to make specific adjustments to break out our Almeyda and Covelo simulations. In order to account for the inherent sample size issues, we added the 2021 season to Almeyda. A better comparison would have required an even 17-game split of the same roster between the two coaches.
Every run of seasons, whether 10, 1000, 10000, or 1,000,000 will produce somewhat different results, but a high amount of stability is achieved even after a couple of hundred seasons. 10,000 seasons took several hours to run, but also provided smoother team histograms (bar curves) compared to 1,000 seasons, so that’s what we chose for this exercise. Who can argue with the results of 10,000 seasons after all? What will 100,000 seasons tell you that 10,000 cannot? The answer is not much.
Here are the Supporters Shield winners after 10,000 2022 seasons:
And here are the Wooden Spoons after 10,000 2022 seasons:
As you can see, these results feel realistic based on both real and perceived 2022 performances.
We also played with various combinations of goals (real results), expected goals (probability-based results), and effects on the standard deviations (goals for the mean values, expected goals for standard deviation values, and vice versa). We ultimately settled on a mixed model of goals and expected goals for both the mean and standard deviation values. On average, this brought down teams like Austin and Vancouver and brought up teams like Nashville and Atlanta United, which all felt very defensible and realistic. Austin, for example, in the real-world 2022 season started the season hot, finished cool, and were convincingly bounced from the playoffs by a far superior LAFC side that only missed the playoffs once in our 10,000 simulated seasons.
Our season simulator thinks Austin FC (above) overachieved their true performance by about seven points in the 2022 MLS season, while the Earthquakes (below) were right on their true performance at 35 points.
Our season simulator code also recognizes the most important aspect of analyzing MLS: its unique home-field advantage. On average 50% of home teams win and 25% of away teams win in MLS, as compared to dynasty leagues like the English Premier League where the home team wins around 43% and the away team wins around 32%. We have computed home and away results separately and given them their own means and standard deviations. A more complex version of the simulator could split the season even further or use other inputs like ASA’s Goals Added (g+).
Thanks to a publicly-available R function called rbnorm, a variant of the rnorm function that incorporates randomization, we are able to control for specific floors and ceilings of scores in soccer games, but we are also able to adjust it if we want to predict something other than a score. In our case, the floor for a game score is zero (duh), and the ceiling is eight. However, we didn’t end up with randomized scores like 7-to-6 any more often than in a normal season, thanks to the goals mean in a game being around three, rbnorm‘s understanding of that mean, and a team’s standard deviation from that mean. rbnorm helped us generate very realistic scores.
Team goal differential also lands within very realistic norms using our code and rbnorm, bringing even more credibility to our simulator’s results. The Quakes, for example, were on average less than one goal different from the actual 2022 season (-17 goal differential) even after 10,000 seasons (-16 goal differential average).
Quakes Epicenter patrons can get even more information on our season simulator in our patron Slack. Join or upgrade for $5 a month for the 2023 season to get access to our patron Slack. Thanks for reading!