Methodology

How FoulTruth_Ai analyzes officiating calls, scores confidence, and surfaces patterns — explained in plain language.

How calls are judged

For each call in our database, we assemble structured context (game state, players involved, official assignment) and unstructured context (Last Two Minute report excerpts, play-by-play descriptions, video when available). Multiple AI models then independently produce a judgment: correct, incorrect, missed, or inconclusive. The latest judgement is the one displayed.

Confidence scoring

Each judgment includes a confidence score from 0 to 100. Higher confidence means the available evidence strongly supports the verdict. Low confidence is shown explicitly — we never hide uncertainty. Confidence is not a probability; it is an analytical estimate from the model.

Provenance & sources

Every judgement is paired with linked sources: the L2M report row, the play-by-play event, the box score, or the original video. If a source cannot be cited, the judgement is marked accordingly and downgraded in confidence.

L2M usage

The NBA's Last Two Minute reports are an authoritative public source for late-game review. We ingest them, link each row to its underlying call where possible, and use them as primary evidence when judging final-period plays.

What "Bias Score" means

A bias score is an aggregate analytical estimate — not an accusation. It compares the rate at which an official's calls go for or against a given team, player, or context, relative to a baseline. A non-zero bias score reflects a statistical pattern in our judged sample, not intent or wrongdoing.

How foul differential works

Foul differential compares how many fouls were called on each team in a game, normalized for pace and possessions where the data allows. Unusual differentials are flagged for review; they are a signal worth inspecting, not a verdict. A high differential alone is never sufficient to claim wrongdoing.

What "Confidence" means

Aggregate analytics carry one of four qualitative confidence labels driven by sample size and stability:

  • High confidence — large stable sample (≥ 60 calls, low variance).
  • Medium confidence — moderate sample (≥ 20 calls) or larger sample with elevated variance.
  • Low confidence — small sample (≥ 5 calls) or high variance.
  • Limited sample size — fewer than 5 judged calls; we deliberately refuse to fabricate precision.

What AI does and does NOT do

AI does
  • Explain a call against rule context
  • Summarize patterns across many judged calls
  • Contextualize a result with comparable history
  • Interpret play-by-play and L2M evidence
AI does NOT
  • Invent evidence that is not in our sources
  • Override the underlying statistical results
  • Accuse any official, player, or team of wrongdoing
  • Replace human judgement on contested plays

Language we use — and avoid

We deliberately avoid sensational framing. Patterns are described as anomalies, variance, statistical deviation, or unusual trends. We do not use words like rigged, corrupt, or fixed, because none of those claims are supported by our analysis, and none of them are appropriate for officiating professionals doing a difficult job in real time.

Limitations

  • AI can be wrong. Judgements are model output, not ground truth.
  • Coverage is incomplete: not every play in every league has full evidence.
  • Bias scores are analytical estimates from a finite sample. They are not legal, professional, or officiating findings.
  • Scores and judgements are not accusations against any official, player, or team.
  • This site is independent and not affiliated with any league or governing body.