The most-read post in the history of our blog with a headline not containing “Ben Arfa” was our discussion last fall of whether Newcastle's rise in the Premier League might represent the arrival in soccer of moneyball. For readers abroad who may not be familiar, "moneyball" is an American sports term derived from the acclaimed book and movie of the same name, based on the story of how the low-budget Oakland A's and their pioneering General Manager Billy Beane (played on screen by Brad Pitt) used modern statistical analysis to compete effectively with much bigger and richer clubs in Major League Baseball.
Baseball isn’t a continuous-flow sport like soccer. The discrete actions of batting, pitching and fielding are easily recordable and measurable, producing an almost unlimited wealth of analyzable statistics. The same goes for American football and its structure of individual plays, as I found during my time as a doctoral student in political science, when it turned out my main talent wasn’t for politics, but for numbers. For a project in a quantitative research course I used eight seasons’ worth of NFL statistics to construct a rating and handicapping system that might have made me a Las Vegas retiree by age 40 if the NFL scheduled 10,000 games a season instead of 256. As it stood, all I got from it was an A.
When I wrote the moneyball piece on Newcastle for our blog last fall, I was dying to conduct a similar study for soccer, to help gauge whether Newcastle’s sudden rise was a mere stroke of fortune, or if the club had stumbled onto something that might help smoothe the economic disparities of soccer like the A’s have in baseball. Unfortunately, soccer is much stricter and weirder than American sports about shielding broadcasts, highlights and information from fans, preferring to charge directly rather than promote widely. (Judge for yourself which approach works better; all I’ll say is the NFL team I support is located in a city of 100,000 and draws 70,000 fans a game.) For my NFL study, most statistics I needed or wanted were available for free. In soccer, all but the most basic and random statistics are hoarded away by the league, the clubs or third-party organizations, and are either unavailable to the public or shockingly expensive.
Or so I thought until recently, when we at the blog discovered an affordable and useful trove of selected Opta soccer statistics via a service known as EPL Index. EPL Index doesn’t have everything I want or need to make sense out of a sport that isn’t as suited to numerical analysis as American football. But it has enough for an interesting start, which has caused me to spend way too much time the past two weeks coding Premier League team statistics since 2008-’09 (the earliest season available) into an analytical program on my MacBook. Using procedures like correlation and regression of which I’ll spare you the close details, I’ve identified or constructed a dozen statistics that seem, from what I can tell so far, critically relevant to a club’s success on the table, especially in the context of Newcastle United and its surprising showing this season.
At the top of the statistical hit parade, unsurprisingly, are goal difference and goals scored. (Goals conceded, on their own, don’t seem as closely related to a club’s overall success – perhaps a discussion for another time.) Goal difference alone is about 90 percent proportional to a team’s overall points in the league. That may seem rather unremarkable – after all, isn’t the object of the game to score more goals than the other team? Yes, but what is somewhat remarkable is the lack of chance or luck that seems to be involved, especially in a sport where scoring is rare and one bad bounce can turn a match. For example, a team might grind out three victories by a goal each and then give up a couple of weird goals on a bad day and lose 3-0, resulting in 9 points on the table but a goal difference of 0. By and large, though, this doesn’t happen, and when it does, it balances out quickly in the other direction. Once in a great while a Premier League club will achieve far above or below its goal difference over an entire season, one notable example being Everton’s 2004-2005 team, which finished fourth with a -1 goal difference. But outcomes like this are exceedingly rare, and reality tends to settle in swiftly (Everton finished 11th the following season). In the end, even with the relative scarcity of scoring, goal difference in soccer is more predictive of a team’s overall record than margin of victory in American football, perhaps because American football plays only 16 games in a season and there isn’t time for irregularities to even out.
Newcastle’s current 6th-place standing on the league table is only slightly better than its 7th spot in goal differential and 8th spot in goals scored. Not much luck to be seen there. It’s true that, taking historical stats into account, a team with only 36 goals and a 0 difference after 25 games should have significantly fewer points (about 33) and stand significantly lower on the table (about 10th) than Newcastle does. But that’s not about Newcastle – that’s about the table. Otherwise Newcastle would rank lower in the individual categories too. The big, rich teams continue to consolidate more of the available talent, making it steadily easier to be best of the rest.
To play moneyball, however, one needs to unearth the more obscure statistics that control the obvious ones. Goal difference and goals scored appear to be underpinned by three slightly less celebrated statistics: chances created in open play (setplays, not as much, again a different discussion), shots on target and minutes in possession. This, too, is rather unremarkable on its face. The more chances and shots you get, the more you’ll score; the more you possess the ball, the more chances and shots you’ll get and the less there will be for the opponent. In these areas, though, Newcastle stands only 14th, 14th and 11th in the league. Based on that, Newcastle’s goal scoring and differential, and consequently Newcastle’s spot on the table, should almost certainly be worse than it is. Something unusual is happening.
Or shall we say, someone unusual: Demba Ba. Of the balls that have gone to Ba’s foot in scoring position, nearly two-thirds have gone on target, and half of those have gone in. This is well ahead of the next player (Robin Van Persie at 22 percent overall conversion), not to mention the rest of the league. As we examined here last week, it’s clear there’s more luck in converting a chance than creating one. So is Ba on a lucky streak, or is he that good? Given the track record at West Ham last season and in Germany before that, Ba appears to be among the small handful of strikers, league-wide and worldwide, with a true talent for attracting high-quality chances and/or putting chances on goal and into the net regardless of their quality. Having rained similar numbers of goals for an inferior club in Germany, Papiss Cisse may be cut from Ba’s cloth, which is probably Newcastle’s best hope for maintaining or improving its place on the table by the end of the season.
Ba and his unusual impact on Newcastle’s goal difference explain some of what’s happening this year. But the next level down in statistical obscurity is the true moneyball level – the little things all players and teams must do moment by moment to control the bigger things, and ultimately the league table. Here are six little things about soccer that look intriguing, in order of importance: non-backward passes (not the accuracy but the raw number, which I added up from Opta’s passes left, passes right and passes forward), successful dribbles (defined by Opta as moving past a defender while maintaining possession), percentage of 50-50s won on the ground (in the air, not so much, see below), successful tackles, intercepted passes, and a statistic I constructed by dividing the total number of 50-50s received on the ground by the total number of 50-50s received. I’ve named that last stat “ground ratio,” and it’s meant to measure how much a team plays on the ground rather than in the air. It’s far from a perfect measure, but it does a fairly decent job at projecting the overall table, and it’s the best I could do with the stats available, a number of which hint ominously that the more and higher a team lifts a soccer ball, the more it will underperform. Here is the level of analysis at which I begin to worry that Newcastle could be overachieving and due for a fade. Again the club stands visibly worse than its spot on the table on all these counts, and there’s no simple or reassuring explanation.
But we’ve covered only 11 of my 12 favorite statistics so far, and on the 12th, Newcastle is more than holding its own.
Like fielding in baseball, goalkeeping in soccer seems maddeningly difficult to measure. Making a lot of saves may just mean a keeper is on a bad club. Save percentage doesn’t say anything about the difficulty of the saves or the positioning of the keeper. Most of the goalkeeper stats seem almost absurd in their lack of relation to anything. Pick-ups? Really?
But one goalkeeping stat available through EPL Index caught my eye: high cross catches. The ability to snare a cross would seem to be a decent measure of a bunch of key goalkeeping skills at once: reaction time, positioning, surehandedness, strength, involvement. A cross catch also robs the other club of a chance, and chances occupy a lofty spot on the statistical hierarchy. Cross catches probably wouldn’t be quite as inflated by the overall performance of a keeper’s team as goals or shots or saves, either. Sure enough, of all the dumb goalkeeper stats I had at my disposal, high cross catches turned out to be the one that was somewhat weakly but almost certainly associated with a club’s overall success. I suspected before I looked that Newcastle and its flying Dutchman would be high on the cross catch table. In fact, it’s the single area I’ve found so far in which the Magpies are running away from the league.
So, is Newcastle actually good, or just lucky? My best guess so far, as a moneyball-loving American, is that Newcastle is both pretty good, and very lucky – lucky to have Demba Ba and Tim Krul.
Cracking article as usual. No idea if any of it is accurate or truly relevant, but it was an interesting read regardless. Cheers!
Posted by: LeeSibbald | 02/23/2012 at 09:32 AM
thats a whole lot of thought going on in that article....interesting though....it just boils down to making the most of the chances you get. and playing solid defense. can't complain with this season if we make it to europe.
Posted by: Jaeger | 02/23/2012 at 02:22 PM
I've mentioned this in the past, if you haven't read 'Soccernomics' and like stats driven arguments, check it out.
Posted by: JeffC | 02/23/2012 at 05:36 PM
Nice work Bob, just know that when Krul gets gobbled up by a big time club this summer it will all be your fault...
Posted by: Dave From Newcastle | 02/24/2012 at 09:52 AM