Friday, March 8, 2013

Some changes to the rankings tables

Updated Sunday 3/10/12
After a long discussion Saturday night with another statistician, I've made a change to the total column.  This doesn't change the rankings, but rather the number in the column and the meaning of that number. Text below describes the updated version.

The meaning of the total column has always been the ratio of how many goals the stronger team puts in to how many goals the median team puts in. It is a measure of the relative rate of scoring. Originally, this was on base(e) but I changed to base(2). This means that now total strength is interpreted as follows. The stronger team puts in 2^(difference) TIMES more goals.  So a difference of 1 means the stronger team scores 2x as much.  So for every 2 goals the stronger team puts in the weaker team puts in 1.  A difference of 2 in the total strengths, means the stronger team puts in 2^2=4 times as many goals, so scores 4 times (on average) before the weaker team scores once.  The top team in an age division is often about 6.  In this case, the stronger team puts in 64 times as many goals, so the stronger team will put in 64 goals (on average) before the weaker team puts in 1---which is why we have divisions(!)

I've also added a "div" column.  These are "LastPlanet" divisions and represents a group of teams which are expected to all be competitive with each other.  The mean GD in a match between the top and bottom teams in a division is 1.  So tight games within a division.  Division U1 starts at the median team and goes up.  Division L1 starts at the median team and goes down.  My idea is to give you a sense of which teams are about the same strength.  Just because teams are separated by a lot of ranks (e.g. team # 60 versus #40) doesn't mean they aren't pretty similar in strength.

I now scale all teams to the median select team in WA and OR. I have to scale to something and the median team in the whole database is prone to bias if I do not have all the select teams (both higher and lower) in the database. In WA and OR, we are familiar with the select leagues and I can be fairly sure that the WA-OR database is not under-sampling the weaker leagues. In other states, it is likely we are missing the lower leagues because the scores for these tend to be buried in district association webpages somewhere (at least they are in Washington).
Finally, the boys rankings are now joint age rankings (so U12-16 all together).  This means that the games a team plays when playing up are also counted in their rank. How do I account for older teams being stronger? In my analyses, teams don't have an "age" designation. They just have a strength. Because younger teams are younger, they tend to be weaker but I don't impose that. It just happens because they tend to lose more against older teams. But when I display the age ranks, I display the strengths scaled to the median strength of teams at that age.

No comments:

Post a Comment