by Anthony Reinhard
We’ve arrived at Week 4 in the XFL season and we are starting to get a feel for how the rest of the year will shake out. In an effort to better acquaint myself with the strength of each team, I took a look at some new rating systems. Each rating system has the same basic output: the difference between the ratings for each team is equivalent to the point spread on a neutral field with an average rating of zero. Home field advantage is worth 2.6 points unless otherwise noted.

I’ll begin with my classic elo ratings that can be found at statbutler.com/xfl. These ratings are using the same parameters as the NFL elo model on FiveThirtyEight.com. Under an elo system, teams always improve their rating with a win and see a rating drop with a loss. The winning team’s rating will go up in the same amount that the losing team will go down. An underdog that wins will have a more significant rating gain than a favorite. This method also allows for a point spread adjustment so that teams that win by more are rewarded. The point spread adjustment is not linear, so a winning by 28 is not twice as valuable as winning by 14.
My XFL version of elo started with ratings that aligned with pre-season championship futures. I converted futures odds into implied probabilities and, through trial and error, arrived at pre-season ratings that produced championship probabilities that were in line with the consensus in Vegas. My thinking at the time was that this would help the model gain credibility faster, but I may have overestimated the ability of oddsmakers to evaluate teams that have never played before. This is particularly clear with a team like New York, who entered the season as the fourth best team and promptly beat the second ranked Tampa Bay Vipers by 20. New York took the top spot on the strength of a solid win against what the model thought was a good team. Tampa Bay would go on to drop their next two, and New York has been dominated by a margin of 56 to 9 in their last two games. Some models would react more quickly to New York’s blowout loss to DC in Week 2, but this version of elo doesn’t move too aggressively and can take longer to pick up on this sort of new information. You’ll also notice that Dallas (pre-season #1) is still ranked ahead of undefeated Houston and Seattle (pre-season #8) is ranked below winless Tampa Bay. These could also be examples of the model acting too slowly.

One solution to remedy the problem of questionable pre-season rankings is to start each team with a rating of zero in Week 1. With this method, we can see the rankings looking closer to what we might have expected. However, you’ll notice that the gap between the best and worst team is still pretty narrow as Tampa would be less than a five-point underdog if they were to play Houston on a neutral site. As I mentioned, elo has a tendency to move slowly. So what if we doubled the K-factor, which will push the model to react more toward recent events?

This comes out closer to what the Vegas spreads are for this week, but as the season goes on, these ratings might change too frequently. An illustration of this is LA passing DC after only one game. While LA did beat DC by 30, we’re putting an awful lot of stock in one game when we have a pair of other games for each team that suggests this was, in fact, an upset.

The simple rating system doesn’t put any emphasis on the most recent game. It weights all games evenly, but is heavily biased by point spread and strength of schedule. St. Louis had games against two of the top three teams and also has a point differential of +22, second best in the league. It isn’t unreasonable they would be the top team, but it is doubtful that they would be favored over Tampa Bay by more than 21 points on a neutral field. You can read more about SRS in this post on Pro Football Reference.

The final set of rankings is by far the most comprehensive. I created these ratings using only the point spread from each of the 16 games this season and nothing else. The idea is that Vegas can pick up on all meaningful information that exists in the public domain and a simple linear model can easily show this. The downside is that a model created this way is opaque and it can be difficult to tell how the ratings will change on a weekly basis. This makes them dangerous for season-long forecasts, but would be my ideal power rankings for this week’s games. You can read more about this kind of model in this very very old post on inpredictable.com.