Statistical Scouting: The MLS SuperDraft
Now that the MLS Superdraft is almost one week away I thought it would be interesting to see if it was possible to create a rough model for picking players given a few different factors. I used the minutes played in each player’s first two seasons as an indicator of each players’ success. I previously found out that almost all great players play around 1000 minutes in their first two years, with three major exceptions: Chris Wondolowski (203 mins), Ethan Finlay (817 mins), and Justin Morrow (869 mins). As for the explanatory variables, I chose the College and PDL team the player played for, U.S. Under-18, Under-20, and Under-23 National Team experience*, and whether the player was Generation Adidas or not.
Creating the model
Using the data I created an regression decision tree in R which had an R-squared of 0.2514 (which basically means that 25% of the variation in minutes played can be explained by the model)… pretty disappointing. However, if you look at the hit rate, the accuracy looks a lot better.
|Category||successful players / total predicted||% successful|
Note: successful is defined as > 100 minutes
Interestingly, Ethan Finlay was the only player predicted to play over 3000 minutes that didn’t play over 1000 minutes, and Justin Morrow was one of the players that was predicted to play over 2000 minutes that did not, so the model is pretty decent at scouting talent. In the future, I’d like to take it a step further and add if the player was from the U.S. or not as a factor to distinguish players like Darlington Nagbe(1185.13 predicted minutes) and Kekuta Manneh(427.78 predicted minutes) who haven’t had the chance to play for U.S. youth teams. Among the model’s other shortcomings is that its lowest minutes projections is 227.33 mins while one-third of the players in the dataset played 0 minutes, which significantly impacted the r-squared.
|Category||% successful||players in category|
|>3000 mins||83.33%||(3) Jonathan Campbell, Brandon Vincent, Michael Gamble|
|>2500 mins||72.72%||(1) Jordan McCrary|
|>2000 mins||70.00%||(2) Kyle Fisher, Jake Rozhansky|
|>1500 mins||66.67%||(1) Tyler Thompson|
|>1000 mins||54.44%||(13) Mikhail Doholis, Keegan Rosenberry, James Moberg, Wade Hamilton, Dennis Castillo, Omar Holness, Connor Sparrow, Abu Danladi, Nick DePuy, Duncan Backus, Hadji Barry, Paul Clowes|
Note: players in bold are on Soccer by Ives’ Big Board Version 1
It’s pretty likely the Quakes will stay at pick 8 because Quakes haven’t traded their first round pick since 2011, so looking at the early mock drafts, snagging one of Brandon Vincent, Jonathan Campbell, or Kyle Fischer to shore up an ageing backline would be ideal. Otherwise, trading down a bit and selecting Jordan McCrary or a GA player should be on the Quakes’ radar. The model was built primarily for selecting American players so I’ve added another column to the raw data with the Nationality for each player, highlighted green if from the U.S. Although, despite the fact that Fabian Herbers, RIchie Laryea, Tim Kubel and Joshua Yaro were born overseas, they are projected for 930.59 minutes, very close to 1000 minutes. Herbers had a monstrous season at Creighton, with 15 goals and 17 assists. Despite the hinderance, Omar Holness still projected fairly high at 1313.33 minutes.
In the second round, picking the best player available is obvious given that only roughly 10-13 players will come beat the thousand minute rule from this draft (And around 6-8 of those players will be predicted by the model). If Michael Gamble somehow falls so low, he’d be a no-brainer. More likely, Dennis Castillo would have a good chance of gaining minutes as a back-up right back. He was called-up for Costa Rica’s U18, U20, and U23 teams, but probably would take up an international roster spot. Tyler Thompson is rated highly by the model, but San Jose already has Fatai Alashe(22), Marc Pelosi(21), and Anibal Godoy(25) at the center midfield.
Also, although the model did not recommend Mitchell Lurie, I’d just like to mention him. In an earlier model with a dismal r-squared, he was one of the players predicted to break the thousand minute rule; he is fourth in assists (6) among defenders in DI according to TopDrawerSoccer, played for the Portland Timbers U-23 team, and was called-up to the U.S. U18 MNT. Could be worth a shot in the Supplemental Draft.
*Since no player in the draft pool was capped by the U.S. U23 team yet I used a U23 College Identification Training Camp held over the summer as a substitute.
**But, if you think about it, it makes sense that r-squared would be that low. Bad players too can play for good teams in college and get called up for U.S. youth teams, so you still need to do a lot of scouting.