Lester Leong
The Minimum Viable Experiment: A Framework for Early-Stage Rigor
The Two Failure Modes
Early-stage companies approach experimentation in one of two ways, and both fail for different reasons.
The first mode is no experimentation at all. The team ships on instinct, interprets any growth as validation, and attributes stalls to execution speed rather than strategic direction. Decisions are made by the loudest voice in the room or the most recent customer conversation. This works until it doesn't, which usually coincides with the moment the team can no longer afford the cost of being wrong. By then, the habit of deciding without evidence is deeply embedded and difficult to reverse.
The second mode is premature sophistication. The team reads about how Airbnb or Netflix runs experiments, builds a feature flagging system, integrates a statistical testing framework, and then never actually runs a meaningful experiment because the infrastructure overhead exceeds their capacity. They have the machinery of rigor without the practice of it. The experimentation platform becomes shelfware, and the team quietly reverts to shipping on instinct while maintaining the organizational fiction that they are "data-driven."
Neither mode produces learning. The first generates no evidence. The second generates infrastructure. What early-stage companies actually need is a practice, not a platform.
The Three Components
A minimum viable experiment needs exactly three things. No feature flags. No stats engine. No experimentation platform. Just three commitments written down before the change ships.
A clear question. What specifically do we want to learn? Not "will this feature work?" but "does adding a progress indicator to the onboarding flow increase the percentage of users who complete step 3?" The question must be specific enough that two reasonable people would agree on whether the experiment answered it. If the question is vague, the results will be ambiguous, and ambiguous results change nothing.
A measurable outcome. How will we know? The outcome must be a number the team can observe within a defined time window. "User satisfaction" is not a measurable outcome. "Percentage of users who complete onboarding within their first session" is. The metric does not need to be perfect. It needs to be observable, unambiguous, and available within a timeframe that matches the team's decision cycle. If you cannot measure the result within two weeks, the question is too large for a single experiment.
A pre-committed decision rule. What will we do based on what we observe? This is the component teams skip, and skipping it is what makes most early-stage experimentation theater rather than practice.
Why the Decision Rule Is Everything
Without a pre-committed decision rule, the experiment does not reduce uncertainty. It provides ammunition. Both sides of an internal debate will interpret the same results through their existing beliefs. A 3% improvement becomes "clearly working" to the advocate and "within noise" to the skeptic. The team argues about whether the results are "good enough," and the decision ultimately gets made the same way it would have been made without any data at all. The experiment consumed time and attention without changing the outcome.
The decision rule eliminates this failure mode by forcing agreement on interpretation before the data arrives. The format is simple: "If metric X moves by at least Y% within Z days, we ship the change. If it does not, we revert."
The power of this structure is that it surfaces disagreements at the only moment they can be resolved rationally: before the results exist. If the product lead believes a 3% lift justifies shipping and the engineering lead believes nothing under 10% is worth the maintenance cost, that disagreement needs to happen before the experiment runs. Once data arrives, the conversation is contaminated by anchoring, confirmation bias, and sunk cost. The same two people who could have agreed on a 5% threshold in the abstract will argue for hours about whether 4.7% is "close enough" once they have seen it.
Write the rule first. Debate the threshold. Agree in writing. Then run the experiment. The data does the rest.
Common Objections and Their Answers
"We don't have enough traffic for statistical significance." You probably do not need it. Statistical significance matters when the cost of a wrong decision is high and irreversible. At early stage, most decisions are cheap and reversible. If adding a feature increases your target metric from 12% to 18% over two weeks, you do not need a p-value to act on that. You need a directional signal and the discipline to revert if the trend does not hold. Perfect measurement is a luxury. Directional evidence paired with a decision rule is sufficient for most early-stage choices.
"We move too fast for formal experiments." A minimum viable experiment adds 15 minutes of upfront work: writing the question, the metric, and the decision rule. If you cannot invest 15 minutes of clarity before shipping a change that will consume days or weeks of engineering time, the problem is not speed. It is discipline.
"What if the results are inconclusive?" This is precisely why the decision rule matters. "Inconclusive" is not a possible outcome when the rule is pre-committed. Either the metric hit the threshold or it did not. If it did not, you revert or iterate. The only way to get an inconclusive result is to run the experiment without agreeing on what conclusive looks like. That is not a data problem. It is a planning problem.
Rigor Is Not Sophistication
The word "rigor" triggers an association with complexity in most early-stage teams. They hear "experimentation rigor" and picture A/B testing platforms, multi-armed bandits, and sequential analysis. That association is incorrect and counterproductive.
Rigor at early stage is not statistical sophistication. It is intellectual honesty. It is the willingness to define what you expect to happen, commit to how you will evaluate the outcome, and follow through on the decision you pre-committed to. That practice costs nothing to implement, requires no infrastructure, and produces compounding returns as the organization scales.
The teams that build this discipline early do not just make better individual decisions. They develop an organizational muscle for learning from evidence rather than arguing from intuition. That muscle becomes the foundation for the sophisticated experimentation program they will eventually need. But the foundation is not a platform. It is a habit: question, metric, decision rule. Written down before the change ships. Every time.