A feature or experience is rolled out to a non-random subset of users — typically the most engaged, most recently active, or most likely to respond — and the performance results from that rollout are treated as representative of the full user base. The selection bias inflates the observed effect, and the full rollout underperforms the test results without an obvious explanation.

Product and growth decisions are made on effect sizes that reflect the best-case audience, not the typical user — leading to missed targets and misallocated development resources.

Rollout performance metrics stop being reliable predictors of full-population outcomes.

A staged rollout showed strong performance metrics in early cohorts, but performance declined materially as the rollout expanded to a broader user segment. The early-cohort users share characteristics — higher engagement, more recent activity, longer tenure — that are not representative of the full user base.