How these numbers are made
A walk through the model, in plain English.
The idea
Runs per game in baseball happen to follow a Poisson distribution quite well — many independent chances to score, each with a small probability of succeeding. That means if we can estimate the average number of runs a team is expected to score in a given matchup (we call this number λ, lambda), we can sample plausible final scores by drawing Poisson random numbers, then run that thousands of times to build a picture of how the game could go.
The methodology here is adapted from a soccer model popularized in David Sumpter's Soccermatics. Baseball actually fits Poisson better than soccer goals do.
Step by step
- Pull every completed regular-season game this season from the official MLB Stats API.
- For each team, compute four splits: average runs scored at home, average runs scored on the road, average runs allowed at home, average runs allowed on the road.
- Park-adjust the home numbers. Some venues naturally inflate offense (Coors Field, Fenway, Great American) while others suppress it (Oracle Park, T-Mobile, Petco). We divide each team's home stats by their ballpark's 3-year run factor so we're comparing apples to apples across the league.
- Compute four strength ratios per team, expressed relative to the park-neutral league averages: home attack, away attack, home defense, away defense. A value of 1.20 means the team is 20% above league average in that split; 0.80 means 20% below.
- For an upcoming game, multiply the relevant strengths together with the league average, then re-apply the home venue's park factor. That produces an expected-runs (λ) value for each team.
- Sample 10,000 Poisson draws for each team's run total. Count how often each side wins, the most likely final scores, the distribution of run totals, over/under at common lines.
What this model doesn't know
This is a baseline. It explicitly does not use:
- Starting pitcher identity — a Cy Young vs. a spot starter looks the same to it. A separate pitcher model is coming.
- Lineup composition — no platoon splits, no rest days for stars.
- Recent form — a team in a 7-game winning streak looks identical to its season-to-date stats.
- Weather — wind at Wrigley or rain at Citi Field can shift expected runs meaningfully.
- Bullpen quality — late-inning leverage is missed entirely.
- Injuries — if a team's best hitter just went on the IL, the model doesn't know yet.
Why the site is hidden before May
Early in the season, sample sizes are tiny. One bad week of weather or a few extreme blowouts can dominate a team's apparent ability. We wait until enough games are played that the numbers start to mean something — early May, roughly 30+ games per team.
How "win probability" relates to "extras needed"
The Poisson model returns three outcomes per simulated game: home wins in regulation, away wins in regulation, or tied at the end of 9 innings ("extras needed"). Real MLB games can't end tied, but Poisson doesn't know about innings — so we report the regulation-tie share separately rather than splitting it 50/50.
Source & transparency
All data comes from the official MLB Stats API. Underlying methodology inspired by Soccermatics (David Sumpter) and the Numberphile Poisson football simulation. Pipeline rebuilds daily after games complete.