Similarity Scores

Before I get started, allow me a quick introduction.  I’m Colin, who you may remember from my dormant blog Championship at Best.  Yep, I’m that stats geek.  Rich invited me to contribute an occasional piece over here, and since I haven’t really had the time to maintain a site on my own, that invitation sounded excellent.  I’m excited to share any  interesting bits of information I come up with, and that’s what I’ll try to do now.  So, on to the fun stuff!

One thing I see a lot, especially in team-oriented sports such as football, is that fans will compare their players with other players.  Most often, I see comparisons of physical attributes – height, strength, pace, sometimes even nationality.  Sometimes these comparisons work, sometimes they are easily debunked.  I’d like to see if we can come up with something totally objective, and just maybe, something that works.

A ton of my work has been borrowed from baseball writer/statistician Bill James.  One thing he came up with was the concept of a similarity score, where he compared players in several statistical categories and show us how different or similar two players are.  It’s a simple formula – start with 1000 points, and subtract the difference in each category.  The closer to 1000, the more similar the players are.  Here, I’ve taken this concept, and applied to the 2009/10 Premier League season.  Each category is weighted differently, depending on how “important” it is to that player.  For example, if Clint Dempsey makes a lot of tackles, then tackles are weighted more heavily in determining the players most similar to Dempsey.  As it is, Dempsey is a unique player, and one whose place has been under debate recently.  Here are his top ten comparables, based on last season:

Sebastian Larsson Birmingham 969
Zoltán Gera Fulham 969
Matthew Taylor Bolton 968
Stephen Hunt Hull City 962
James McFadden Birmingham 961
Damien Duff Fulham 960
Morten Gamst Pedersen Blackburn 960
Steven Fletcher Burnley 960
Kevin-Prince Boateng Portsmouth 959
Kevin Doyle Wolves 959

It’s interesting to note that two of the players he’s competing with for a place (Gera and Duff) make the list.  This may say a lot about Roy Hodgson’s shape, structure and defined roles, but it could also suggest that the three are interchangeable.  Let’s look at the other two players in question.  Here’s Gera’s top 5:

Leon Osman Everton 971
Chris Eagles Burnley 970
Glenn Whelan Stoke 969
Sebastian Larsson Birmingham 969
Andy Reid Sunderland 967

and Duff:

Craig Bellamy Man City 972
Mark Noble West Ham 970
Luka Modric Tottenham 965
Matthew Etherington Stoke 964
Samir Nasri Arsenal 956

It’s interesting to see that one-time rumored signing Craig Bellamy (who instead joined Cardiff on loan) was Damien Duff’s #1 comparable.  While Bellamy is obviously a major talent, his signing might have created an even greater logjam for us.  If we were to sell a player, say Dempsey or Gera, then perhaps Birmingham’s Larsson would be a suitable replacement?   At the very least, this makes it appear that he would be able to slide in and do the same job.

And I feel that that’s where this information might be most useful.  If a key player gets sold, can we find a replacement that fits in with our system?  What if Brede Hangeland moves on to a bigger club?   His comparables suggest that finding a drop-in replacement could be difficult and costly:

Nemanja Vidic Man Utd 970
Kolo Touré Man City 960
Sébastien Bassong Tottenham 958
Aaron Hughes Fulham 953
Michael Turner Sunderland 950

Similar story for Bobby Zamora:

Emmanuel Adebayor Man City 977
Dimitar Berbatov Man Utd 971
Louis Saha Everton 967
Frédéric Piquionne Portsmouth 962
Carlton Cole West Ham 962

While the top three are going to be out of our price range, we had heard rumors earlier this summer that Fulham were looking to sign Frédéric Piquionne.  And more recently, I’ve read that Roy is interested in Carlton Cole for his Liverpool squad.  I don’t think that’s a coincidence.

I expect that this formula will always have some tweaking to be done, but I think we have some reasonably accurate results already.  If my programming skills will allow it to happen, I hope have an online tool available soon to check these comparisons for any player.  I’d be happy to answer any questions or implement any suggestions, so feel free to share any ideas!