Creating AI Specifications to Produce a “Better” College Football Playoff Bracket
Written by Geoff Kreller, CRCM, CERP
Recently, we highlighted the potential difficulties in creating specifications and reward functions that enable agentic AI agents to autonomously work toward a desired outcome using only appropriate means. We thought it might be informative and fun to explore the specifications we would use to deliver an outcome to a seemingly simple set of prompts (we did say “ask us anything”). With a number of people angry about their team’s exclusion from this year’s college football playoff and the quality of play potentially caused by long layoffs, we were asked by several people (sadly, not the NCAA) to create a proposal to:
Increase the overall satisfaction of the college playoff selection and scheduling
Make the selection process better understood (meaning outline the specifications for entry)
Being an avid sports fan, I recognize it doesn’t matter whether the college football playoff (CFP) has four teams, eight teams or twelve teams – there will always be a debate on whether the “right” teams were left out, or whether the “wrong” teams were kept in, and that fifth, ninth, or 13th team will always feel snubbed. This year, Notre Dame (and to a lesser extent BYU) were the teams left out in the cold, and the Notre Dame fan base was quite clear in voicing their displeasure.
1. What are the difficulties?
Understanding and mitigating the architect’s bias is critical in creating a sound model. As a proud fan of the Maryland Terrapins, there is no sound methodology by which they could ever be included in this conversation for several years. It doesn’t matter to me what teams get invited, or who actually wins the national championship. Ideally, I just want an entertaining, fulfilling resolution to the college football season. However, inherent bias creeps in from my preferences and experiences. I love college basketball’s March Madness tournament structure – conference champions are invited to the big dance regardless of their strength, there are tons underdogs from lesser known divisions to cheer for, and there is always the potential for huge upsets that send the odds-on favorite home early (go UMBC Retrievers!). Because of my preferences, I have a different interpretation of a “better” bracket outcome than others who might consider this same question.
Beyond bias, the lack of common opponents creates a significant problem in predictive college football rankings and analytics. In the NFL, you have 32 teams playing 17 games (each team playing 14 different opponents). In the NBA, NFL, and MLB, you have a schedule where every team plays everybody else. In NCAA football, you have about 100 teams across multiple conferences who play at most 13 games where there is very little crossover. There are some conferences so large that teams don’t play everyone inside of their conference – the Atlantic Coastal Conference (ACC) boasts 17 games but only 8 conference games a year.
Wins and losses between teams are not considered equal. Because the majority of highly sought after recruits go to teams in the Big 10, Big 12, South Eastern Conference (SEC) and (to a lesser extent, the ACC – collectively “The Power 4”), it’s easy to overrate individual teams and victories in those conferences. Similarly, it’s hard to judge the overall strength of a team from the American, Conference USA (CUSA), Mid-American, Mountain West and Sun Belt because their opponents don’t often offer the same caliber of competition. As a result, losses to teams in these five conferences are more detrimental to tougher competition losses in the Power 4 (and are weighted as such).
With college players constantly moving from team to team through the transfer portal, the school’s prior year’s performance has become less indicative of success in the following one. Pre-season rankings are based on the aggregate ability of individual players on a team’s roster, not on actual data related to that team’s overall strengths, cohesiveness and performance. Pre-season rankings generate some bias toward the end result (starting in the top 10 makes it easier to end up there), and there is often recency bias in respect to when teams lose games. Teams with early season losses have time to recover and reclaim their prior ranking; teams with losses later in the season often find themselves on the outside of the playoffs looking in. Conference championships are especially nerve-wracking if one or both of the teams are on the bubble for making the national championship bracket (and yet, there is still an intense desire to play that extra game).
There are also four major teams (Notre Dame, Connecticut, Washington State, and Oregon State) that don’t currently play in a conference. The first two are independent of conference play, and the other two are holdovers from the old PAC-12 conference.
To stay on task, we decided to take the Associated Press (AP) Top 25 as a given. For the reasons described above, we know it’s biased to some degree and is based on a fair amount of incomplete information subject to interpretation. Perhaps in a future exercise we will consider better systems to identify the top teams in college football, especially at the margin of the tournament selection list.
We also encourage Notre Dame and others to join a conference. They might be independent from a conference, but they can’t possibly be independent from the system (which might have actually cost them a chance at playing for a national championship).
2. What specifications did we consider?
To keep our system on track toward the desired objective, we considered specifications for both the selections and the tournament format. Some of these might be obvious to a human – remember that agentic AI models only have the context and boundaries that we provide though.
Selection and Seeding Criteria
A total of 24 teams must be selected (reward to 0 if violated)
Must include the Associated Press (AP) Top 16 (reward to 0 if violated)
Changing the order of the AP Top 16 is allowed (but the reward function loses points, and is compounded exponentially with larger shifts – moving Ohio State from #2 to #7 leads to x^5 loss in points)
All conference champions must be invited (reward to 0 if violated)
If there are slots still available after adding the Top 16 and the conference champions (there may be overlap), those slots go to the conference runner up ranked highest (and then next highest if slots are available) in the Top 25 prior to the conference championship game.
If there are still slots after this condition is satisfied, the highest ranked team in the AP Top 25 not yet included is invited
Limit west coast teams going to the east coast (and vice versa) in the first round
Maximize regional matchups in the first round to further prevent long distance traveling and simplify scheduling.
Tournament Criteria
Games are only allowed on Friday and Saturday (Sundays conflict with the NFL)
There must be at least six calendar days between games for a specific team
Each game must have at least two hours where it does not conflict with another college football game
The top eight seeds receive first round byes
There are no byes after the first round
The tournament lasts five weeks and is single-elimination
Games must maximize viewership (avoid game times that preclude one U.S. coast or the other, such as starting a game any time after 10:00 p.m. EST or before 12:00 a.m. EST)
3. What did we exclude?
We didn’t consider what happens to the other bowl games – those 32-40 teams could form a separate tournament (akin to the NIT college basketball tournament for teams who don’t make it to March Madness), or they could keep the remaining structure of exhibition bowl games.
We also didn’t allow our agentic AI agent to run thousands of game simulations to fully re-rank the participants (and perhaps settle or further enflame the Miami – Notre Dame debate for this year). There are complex algorithms that betting agencies use to determine the points spread of each game; it’s not a far reach in the future to adapt those models and the actual resultant outcomes (potentially simulating matchups that never happened) for this purpose though.
4. What was the result?
The top eight seeds (who received byes) were largely unchanged – Indiana, Ohio State, Georgia, Texas Tech, Oregon, Ole Miss and Texas A&M.
In addition to the rest of the AP top 16 (Alabama, Miami, Notre Dame, BYU, Texas, Vanderbilt, Utah, USC), the remaining conference championship winners (Tulane, James Madison, Duke, Boise State, Western Michigan, and Kennesaw State) and two at-large teams who lost their championship games but were in the consensus AP top 25 (Virginia, North Texas) rounded out the field of 24.
Fan bases from Arizona (Big 12), Michigan (Big 10), and Houston (Big 12) wouldn’t be sending us Christmas cards, as they would be the last teams out of the tournament.
The playoff started immediately, with the first games taking place the week after the conference championship games (thus eliminating long inactive periods that might lead to sub-optimal play). An exciting set of week 1 matchups were slated for Friday, December 5th and Saturday, December 6th:
Alabama (9) vs. Kennesaw State (24) – Friday, 12:00 p.m. EST
Miami (10) vs. Western Michigan (23) – Friday, 3:00 p.m. EST
Notre Dame (11) vs. Duke (22) – Friday, 6:00 p.m. EST
BYU (12) vs. Boise State (21) – Friday, 9:00 p.m. EST
Texas (13) vs. North Texas (20) – Saturday, 2:00 p.m. EST
Vanderbilt (14) vs. Virginia (19) – Saturday, 5:00 p.m. EST
Tulane (15) vs. James Madison (18) – Saturday, 8:00 p.m. EST
USC (16) vs. Utah (17) – Saturday, 10 p.m. EST
5. What did we miss in our specifications?
To sell ESPN, ABC, and FOX on this proposal, there would need to be a ratings maximization specification. I appreciate that the west coast teams are playing in the later time slots, but I can already hear Alabama fans screaming for the Saturday prime time game (especially since they are in the Friday afternoon game).
Our analysis did not include the historic Army-Navy game which takes place at 12:00 p.m. on the Saturday following the conference championship weekend. Based on the criteria, the model did push the start time of the Texas-North Texas game to 2:00 p.m. While it only causes a scheduling adjustment this year, it would create pandemonium if Army or Navy made the national tournament by winning the American conference or being in the top 16.
While this outcome is more inclusive of schools typically not invited to the college football playoff, it’s not clear that we invited the “best” 24 schools. In executing this proposal, it’s possible that we just moved the debate from Miami vs. Notre Dame to Arizona vs. Virginia or Michigan vs. North Texas for the final slots in our tournament. The debate for teams left out will always exist, regardless of whether the CFP includes 4, 8, 12, 24, or even 32 teams.
Introducing more inter-conference games throughout the season (not in week 1 when these teams haven’t actually played a game together) would help add vital data points for team and conference comparisons, and the use of gaming simulations could be used to enhance the existing rankings analysis.
Summary
Through this thought experiment, we created a proposal where the specifications for tournament entry are more easily recognized, though the question remains whether those are the right specifications to include. Including a data set like the AP Top 25 introduces any bias present into this model, and it’s not clear whether we did enough to balance that bias with other mitigating criteria. With that said, the result produces a tournament where these teams can prove themselves on the field (not just in analytics) and earned the right to do so through their regular season performance.
Being the biased architect of this design, I would absolutely watch these games and suspect that the viewership and attendance would include additional audiences for the exciting conclusion to the college football season. This proposal may also have helped teams like Ohio State, Georgia, and Texas Tech who in the actual CFP hadn’t appeared in a game for three weeks (and lost in their first game).
Do you agree? How would you change these specifications? How would you design a model to yield the “ideal” college football playoff? Let us know your thoughts!
Follow NAQF on LinkedIn for additional insights. For more information on how NAQF can help your organization with model development, artificial intelligence models, or model testing contact us at contact@naqf.org.