The Tale of the Tape: Exploring the UFC Fight Library
Background:
As an avid fan of the UFC for the last decade, I have seen a countless, borderline excessive, amount of fights. I have seen enough fights to tell you that it is nearly impossible to know which fighter will emerge victorious. This is because in the sport of MMA, there are many variables that contribute to the outcome of a fight including physical attributes such as age and size in addition to the kicks, punches, elbows, knees, and grappling. With the most “stacked” fight card of the year and my Exploratory Data Analysis Project Due Date coalescing on the same weekend, I began to wonder what statistical insights, if any, I could draw from previous UFC bouts to better predict the outcome of the main event on Saturday.
Dataset:
To perform an analysis of previous UFC fights, I found a dataset on Kaggle, which contained a plethora of information on over 4,500 UFC fights from March 2010 to February of 2021. The dataset included fighter names, fight outcome, and various physical statistics such as height, reach, weight, performance statistics such as significant strikes landed and takedowns landed, and fighter history including a fighters win-loss record.
Objective:
My initial curiosity in exploring the dataset revolved around “the Tale of the Tape.” As pictured above, the Tale of the Tape refers to variables including a fighters’ Age, Height, Weight and Reach. My objective was to discern what, if any correlation I could find between these variables and predicting the outcome of a fight. However, in performing this analysis and the remainder of analyses I conducted for this project, I employed a universal rule within the mixed martial arts community- “matchups make fights.” Therefore, in my analysis of Age, Height, Weight, and Reach, I used these variables in comparison to their opponent. For example, if a fighter was 1 inch shorter than his opponent, his height would be categorized as a “height difference” of -1.
Variable 1: Age
I began my analysis with examining how a fighter’s age correlates with the probability of winning a fight. To do this, I read the UFC fight library as csv file through Python and converted the database into a data frame using Pandas. I then calculated percentiles for age difference across the dataset. Using the data frame, I was able to calculate the win probability in each percentile grouping by dividing total fights won in a certain percentile by total fights in the respective percentile. My hypothesis was that an age difference would be moderately correlated with an increased win probability.
Below are figures depicting a fighter’s Win Probability Across Age Difference across various percentiles. These results indicate that fighters generally do not have a sizeable advantage as a result of age unless their age difference is greater than seven years.
Tangent #1:
Although Age difference appears to have a somewhat significant correlation with win probability, conventional mixed martial arts wisdom argues that age is less important in the Heavyweight division. This has been supported by Heavyweights such as Daniel Cormier, who have claimed the heavyweight title at the ripe age of 39. To observe the validity of this claim, I plotted the same figure against an additional figure filtering for the heavyweight division.
Variable 2: Height
In my analysis of height, I repeated the process I used for age, this time filtering for height difference instead. My hypothesis was that height would not have a large impact on fight outcome. This is because many times, shorter and stockier fighters are usually wrestlers who use their stature to their advantage through quick explosive punches and takedowns.
Variable 3: Reach
In my analysis of reach, I again repeated the same process for the first two variables. My hypothesis was that unlike height, a significant reach difference would also provide a far more significant correlation with win probability. This is because fighters with a long reach can better control the range of striking and in turn the style and pace of the fight.
Variable 4: Weight
Because all MMA bouts are organized into specific weight classes, there typically is no variation in weight. However, in the Heavyweight division, there is some difference among fighter weights. Nevertheless, the dataset I had access to did not provide a sufficient enough detailed account of fighter weights. Therefore, I was forced to omit weight from my analysis.
Tangent #2: The “Wrestling Advantage”
Amongst MMA fans, there is a commonly held meme-like belief that “wrestling is the best base for MMA.” This does not seem to be an unfounded claim, as the majority of current UFC champions have a collegiate wrestling or equivalently prestigious wrestling background. To find a correlation between a fighters wrestling advantage and win probability, I started with the variable takedowns per 15 minutes. This statistic describes the amount of takedowns a fighter is able to secure per 15 minutes, which is the standard length of a non-championship bout. I proceeded to find the difference in takedowns landed per 15 minutes between the fighter and their opponent and was able to find the various percentiles of difference. Below are figures that demonstrate my findings.
Variable 5: Win/Loss Differential
In addition to the “Tale of the Tape,” I sought to discern whether fighter history played any role in correlation with an increased win probability. To do so, I calculated win probability for both win and loss differential.
The Main Event: Predicting Israel Adesanya vs. Jan Blachowicz
To put the data frame to the test, I sought to filter the variables: Age, Height, Reach, and Win/Loss Differential into my data frame to derive a prediction of the main event based on these variables historical correlation with win probability.
Percentiles (From Adesanya’s Perspective):
Age Difference: 10%
Height Difference: 77%
Reach Difference: 72%
Win Difference: 5%
Loss Difference: 1%
Analysis:
After inputting the following statistics into my filter, I was surprised to find that there has never been a bout with the aforementioned combined differences in Age, Height, Reach, and Win/Loss Differential. As a result, I decided to solely pursue the physical variables of Age, Height, and Reach. After additional tweaks to my filter to expand the percentile width (in order to broaden sample size), I was able to find the following result.
Conclusion:
With such a small sample size, the statistical significance of this result is likely very small. In addition, these results imply 23% increase in win probability to Adesanya, however this is likely due to his comparative advantages in Age, Height, and Reach. If other practical variables were taken into account, such as wrestling defense or significant strike percentage, these probabilities could be very different. Nevertheless, these advantages are likely why Adesanya is a -450 favorite going into the bout.
The variables that seemed to have the most correlation with an increased win probability were Age and Takedown Differential. However, these results only showed moderate impacts on win probability in cases of outliers. Therefore, basing a prediction on the outcome of a predicting the outcome of an MMA bout solely based on the Tale of the Tape is largely impractical. Additionally, my initial premise that the winner of a fight is extremely hard to predict is somewhat supported by these analyses.
In future analysis, it may be more practical to observe more fight related statistics similar to takedown differential to obtain a more comprehensive win probability.