Tuesday, November 5, 2013

Code

If you want to do your own rankings of teams

I've released a package in R for ranking teams using Dixon and Cole's Poisson regression approach, which is what I'm using. fbRanks R package. You'll want to start by reading my tutorial here: Basic Team Ranking. Make sure you are getting fbRanks 1.1 (should show up by April 2nd). If Mac version is not up, wait a few days. The R servers need to generate that and I uploaded 1.1 on March 30th. What's R and how do I get it? Go here R project. How do I use it? Look there and then I suggest you use RStudio as your R platform (RStudio). All free. All open-source. All cross-platform.

If you want to help us rank U.S. west coast youth teams

One of the questions we often get is 'Where are the rest of the girls' ages?'  'Why G01s and not other ages?'  This is a volunteer effort to provide regional ratings.  A volunteer with a G01 was interested in seeing the WA/OR ranks and then was interested in seeing how WA compares to CA....

We don't have a volunteers yet to be 'data stewards' for the G00-G94s.  Hopefully, for 2013-2014, some volunteers will step up to help maintain the girls' data.  The G01 rankings bring in half the traffic on the site, getting hits up and down the west coast, so there is definitely high interest in regional ranks of the girls teams.

The set-up for scraping league and tournament data is getting more and more automated, so getting match data won't be as much of an undertaking as it was this year.  Still, maintaining the match data does require a weekly commitment.  However, getting the teams sorted out and figuring out what team names match what teams and clubs is a major effort.  Teams unfortunately use 5-10 (yes 10) different variants on their name in different leagues and often leave off club, age and whether they are the A, B, C (or white, green, blue) team in the club.  Thus it takes some effort and web research to develop the team name tables.

Here's what getting another age up on the blog involves:

1) You assemble an Excel file of matches that looks like this and email it to me periodically:
https://www.dropbox.com/sh/3wttk4gbxd225w2/O-wAivsFpm/example_match_entry.csv
Include all the teams you want ranked using whatever set of games you want.   You can enter by hand or I have webscrapers that are easy to use.

2) You assemble a team file with info on each team that looks like so:
https://www.dropbox.com/sh/3wttk4gbxd225w2/q7c4-yAAfq/teams-master.csv
The team names should match the team names in the match file above.

The example files are also here:
https://www.dropbox.com/sh/3wttk4gbxd225w2/hu3B2SqQWe
  • example_match_entry.csv  shows how to enter match data in an excel file (comma delimited)
  • team-master.csv  shows a sample team file (comma delimited)
  • team-resolver.csv  Sadly teams use many different names in leagues and tournaments.  If you don't want to do search and replace on every variant of name that a team uses, then I need a team-resolver file like the one in the folder.  column name matches the name column in team-master.csv while alt.name columns has all the different alternative names the team has used in leagues and tournaments.  My code will search for the alternative names in your match file and replace with a standard display name.
  • match-resolver.csv  If you want to websrape, this is the file I use.  You don't need to scrape, but if you are trying to do multiple states, it definitely helps.
If you feel like taking on an age or region that I don't cover, then look over the files and contact me:  lastplanetranking@gmail.com