Rating system description



Introduction

A rating system is an attempt to quantitatively represent the strengths of individuals engaging in a competitive activity. Perhaps the best known rating system is the so-called Elo system, developed by Arpad Elo for rating chess players. The Elo system is related to methods used in psychology and other areas for paired preference comparisons. Similar systems have been used with other games, such as go, and for competitive sports such as table tennis. These systems produce a rating for each participant, based on a formula which gives predictions for game outcomes, based on the difference between the two contestants' ratings.

This soccer rating system produces a rating for each team based on the match outcomes: win, lose, or tie. It takes into account who the opponents were, the home field advantage, and a seed rating based on the historical record. Crudely speaking, a team with an even record against opponents averaging 1500, will have a rating around 1500. A team winning 2/3 of its matches against teams rated 1500 will have a rating around 1600, and so on. The precise value will depend the details of the opponents ratings and match venues.

The ratings of women's teams are not comparable to ratings of men's teams. Ratings across the different divisions are reasonably well calibrated, but I can't guarantee perfect calibration due to the small number of inter-divisional matches played.

Interpretation and Prediction

The ratings are not an absolute scale; only the differences between ratings are informative. Here is a brief table of winning probabilities for the higher rated team, ignoring home field effects:
     Rating
      Diff     Prob    Odds
         0     0.500    1:1
       100     0.667    2:1
       200     0.800    4:1
       300     0.889    8:1
       400     0.941   16:1
       500     0.970   32:1
       600     0.985   64:1
       700     0.992  128:1

The average home field advantage in collegiate soccer is around 50 rating points for both men and women. Thus if the home team is rated 50 points higher than the visitor, then the effective rating difference is about 100 points. You should not expect the higher rated team to win all the time: for example, looking at all matches featuring a 100 point rating difference, the higher rated team should win 2/3 of the time.

Teams are ranked in the ratings tables on the basis of their ratings, but the ranks have no direct interpretation independent of the ratings.

In the ratings tables, the column labeled SE (standard error) is an estimate of the uncertainty in the rating. You should think of the rating as (R ± SE).

The column under Opp is the median rating of each team's opponents. This a an indicator of the strength of schedule - half of the opponents were rated at or above this value.

Technical details

The probability model underlying the rating system assumes that the match outcomes are independent. This is a strong assumption, and may not be exactly correct. For example, if for a series of matches during the season several key players on a team are injured, then the team may be functioning at a lower level until those players return. This induces dependence (correlation) in the results. The college season is short, which makes it difficult to estimate these effects; my belief is that the average performance level over the season is estimated well in spite of the possible failure of strict independence.

There are various methods of estimation which might be used to estimate the ratings, given the outcomes of matches. One standard method, maximum likelihood estimation (ML) has nice properties if there are large numbers of matches for each team. ML corresponds to choosing the values for the unknown parameters (ratings) which lead to the maximal probability for the observed results. Unfortunately, if a team wins either no games, or all of its games, then the ML estimate of its rating is undefined. The solution used here is to make use of Bayesian methods. The ratings also depend on what is known as a prior distribution, based on the previous year end-of-season ratings. Early in the season this will tend to produce ratings close to the prior or seed ratings, but as the results accumulate during the season, the rating will be more dependent on the current results and less dependent on the seed values. Most soccer programs have considerable year-to-year continuity, and this helps produce accurate ratings as early as possible in the season. For teams that undergo radical change between years, the ratings will be able to respond to the accumulation of new results during the season, producing accurate ratings for all teams by the end of the season.



Albyn Jones
jones@reed.edu