New Paper: SCOPE: Selective Cross-validataion Over Parameters for Elo

Posted on 09/20/2019 | By: Rogelio E. Cardona-Rivera

QED Lab Member Alex Bisberg has had his paper SCOPE: Selective Cross-validataion Over Parameters for Elo accepted for presentation at the Poster Session at the 15th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE) to be celebrated in Atlanta, GA, USA on October 8th through 12th, 2019.

Precisely quantifying “skill” in competitive games is of interest for a variety of reasons. Dramatic, or close games are fun to watch and play. Teams and players of similar skill play games which lead to more uncertain, dramatic results. Also when some players queue up to play on their own they would rather play close matches, so skill quantification is important for matchmaking. How to best quantify skill is an open problem. Esports are “multiplayer video games played competitively for spectators, typically by professional gamers,” so they are a good medium to study this problem.

In the 1950s Arpad Elo, a Hungarian chess player and mathematician, invented the Elo model to quantify skill -- and win probability -- of chess players as they progressed through tournaments. Elo's method and modifications of pairwise comparison have been extended past individual board games to team sports and esports.

In this work, we demonstrate that Elo-based models can achieve high levels of win prediction accuracy while retaining their interpretability. Methods for adapting Elo are typically unsystematic and developed on a case-by-case basis. To address these limitations, we developed SCOPE.

The basic idea behind Elo is to update our idea of a player or team’s skill over time based on whether they win or lose. A strength, and potential weakness, of this model is that win/loss data was the the only data Elo’s version of the model used. The model parameters adjust how sensitive the model is to each win and loss. One could imagine that winning a game of MLB baseball, a sport with over 100 games a season, would change our perception of a team’s skill much less than an NFL football game, which we only see 16 of each season. Next, given some statistical assumptions, it’s possible to convert this score into a win probability. The probability gets larger the farther apart the two scores are. Later versions and modifications to the model integrate data from the margin of victory. We feel it makes sense that a team that wins by a lot would have a higher skill than if the game was close.

After cross-validating the model with all of the identified parameters, we were able to raise the overall win prediction accuracy from 56% to 68% on a Call of Duty esports dataset. We appreciate the effort from Dean Wyatte and Justin Shacklette from Activision to provide us with clean structured data. In the future we plan to apply this technique to other esports and compare skill expression across different game genres.