Symptom
The test shows a significant lift at 42% sample completion — an early significance reading that has not yet accumulated enough data to be reliable.
You asked
Within two weeks of the full launch, conversion has declined six percent from the pre-test baseline — the apparent winner was a false positive from early stopping.
Symptom
The test shows a significant lift at 42% sample completion — an early significance reading that has not yet accumulated enough data to be reliable.
Cause
Significance indicators are prominent in real-time dashboards while sample completion rates are buried — each interim check inflates the cumulative false positive rate beyond the stated alpha.
Impact
The shipped variant produces a post-launch conversion decline because the observed lift was early-stage variance rather than a true treatment effect.
Within two weeks of the full launch, conversion has declined six percent from the pre-test baseline — the apparent winner was a false positive from early stopping.
The test shows a significant lift at 42% sample completion — an early significance reading that has not yet accumulated enough data to be reliable.
Significance indicators are prominent in real-time dashboards while sample completion rates are buried — each interim check inflates the cumulative false positive rate beyond the stated alpha.
The shipped variant produces a post-launch conversion decline because the observed lift was early-stage variance rather than a true treatment effect.