Do Some Regions Have Inflated Ratings?

The question of “rating inflation” in different regions or divisions has been a topic of debate. Some argue that places like New York and New Jersey have higher rating inflation because they have numerous tournaments with highly rated fencers. On the flip side, others believe that these areas have deflated ratings because it’s challenging to perform well and earn a rating with such strong competition. Determining rating inflation is tricky through personal experience alone, given the considerable skill level variation within individual fencers of a given rating. However, with the powerpower the probability that a study will correctly reject the null hypothesis when it is false, indicating the sensitivity of a test to detect an effect if it truly exists of statistics, we can look at a large sample of data to find which regions truly have rating inflation!

USA Fencing Region Map

For reference, here is the USA Fencing region map:

Measuring rating inflation

Let’s focus on Cadet and Junior pool bouts since these two age groups are fairly homogenous and have a wide range of skill levels. Additionally, let’s limit the bouts to those that took place in either 2019 or 2020 to ensure comparability with the present. Even with these restrictions, we still have over 15,000 bouts for each weapon, providing a substantial sample sizesample size the number of individual observations or data points collected from a population for use in statistical analysis or experimentation.

To measure rating inflation, we use multiple linear regression. The independent variables are the fencer region, opponent region, fencer rating, and opponent rating. The dependent variable is the score difference in the bout. Additionally, since win/loss and score difference depends on which perspective you are looking at the bout from, each bout is randomly assigned a perspective.

By controlling for individual skill (ratings), we can isolate the effect of region on score difference and win rate. In a scenario without rating inflation, regional controls wouldn’t influence the results, as fencer ratings alone would fully explain the score difference or win/loss. On the flip side, if a region’s coefficient is statistically significantstatistically significant a result in statistical testing that provides enough evidence to reject the null hypothesis, suggesting that an observed effect is likely not due to chance alone in comparison to another, it suggests inflated/deflated ratings there.

Which regions have the most inflation?

Each dot represents a region with a 95% confidence intervalconfidence interval a range of values calculated from sample data that is likely to contain the true population parameter, providing a measure of the uncertainty or precision of an estimate.

Foil

In foil, Region 6 had the most inflated ratings. A Region 6 fencer would on average, be more than half a point down against a Region 4 fencer with the same rating in a 5-touch bout. Regions 2 and 5 also had inflation, but it was less severe, as they would only be about half a point down against a Region 4 fencer of the same rating. Regions 1, 3, and 4 were not significantly different from one another.

Epee

In epee, Region 1 was the most deflated, scoring significantly more points against same-rated fencers from other regions. Regions 3, 4, and 5 were all about in the middle, neither inflated nor deflated. Regions 2 and 6 were the most inflated, on average being half a point down against a Region 1 fencer.

Saber

Region 2 is the most inflated in saber, on average being 0.6 points down against a Region 3 fencer in a 5-touch bout. Regions 3, 4, 5, and 6 are all a bit deflated, but all of them are relatively similar to one another. Region 1 is the closest to a middle ground, not being substantially inflated or deflated.

Overall

For all three weapons:

Region 2 had rating inflation
Region 4 never substantially differed from Region 3.
Generally, most ratings were deflated compared to Region 3.
Within each weapon there were always a few other regions who had rating inflation or deflation.

This means that rating inflation is a real phenomenon.

Additionally this proves that populous regions are (generally) deflated in comparison to other regions. Regions 3 and 4 are the most populous and were always on the right (deflated) side of the graph.

Also, note that the average gap between two “adjacent” ratings in pools is about half of a point in epee (this gap is about 0.8 points in foil and saber). This means that in epee, fencers from Region 6 are usually rated one rating higher than a similarly skilled Region 1 fencer

What does this mean?

At national tournaments where fencers without national points are randomly seeded, the existence of rating inflation increases the aspect of chance in pools. For example, a fencer who gets a B from Region 2 is very lucky compared to a fencer who gets a B from Region 3, even though in a parallel universe these two B-rated fencers could have been switched in the seeding arrangement. However, it is difficult to solve this problem, as even Elo or other performance ranking systems are subject to regional inflation and deflation.

If you enjoyed, please follow!