← Back to Episodes

You asked

Why does my A/B test winner stop working after I launch it?

Within two weeks of the full launch, conversion has declined six percent from the pre-test baseline — the apparent winner was a false positive from early stopping.

Symptom

The test shows a significant lift at 42% sample completion — an early significance reading that has not yet accumulated enough data to be reliable.

Cause

Significance indicators are prominent in real-time dashboards while sample completion rates are buried — each interim check inflates the cumulative false positive rate beyond the stated alpha.

Impact

The shipped variant produces a post-launch conversion decline because the observed lift was early-stage variance rather than a true treatment effect.

Full diagnostic context

Within two weeks of the full launch, conversion has declined six percent from the pre-test baseline — the apparent winner was a false positive from early stopping.

The test shows a significant lift at 42% sample completion — an early significance reading that has not yet accumulated enough data to be reliable.

Significance indicators are prominent in real-time dashboards while sample completion rates are buried — each interim check inflates the cumulative false positive rate beyond the stated alpha.

The shipped variant produces a post-launch conversion decline because the observed lift was early-stage variance rather than a true treatment effect.