eCommerce conversion rate optimization (CRO) and A/B testing
My now good friend and analytics partner Lea Masatsugu and I first met when we were tasked with developing an A/B testing curriculum for a portfolio of direct-to-consumer eCommerce companies spanning the US, Latin America, and Europe. Analytics + Design was a 1+1=3 kind of partnership, where we each multiplied the others' value.
Since then, I've worked with Lea and other analytics collaborators to develop our own framework to use with eCommerce clients in a test, learn, and iterate approach. It help us efficiently uncover opportunities, form hypotheses, align with business leaders, and collaborate with product teams.
Although clients still present new challenges that compels us to adapt our thinking and process, here's what we've learned so far:
Get the FREE Google Slides template
Get the FREE Google Slides template
Use our template to help design your A/B tests, organize your analysis, and schedule each release.
Use our template to help design your A/B tests, organize your analysis, and schedule each release.
Why A/B testing is worth doing
For most startups, A/B testing is kind of like getting five servings of vegetables per day — you know you should do it, but most people don't. There are so many reasons not to:
When A/B testing programs have failed to catch on, it was usually one of these four reasons:
There are too many other things to work on.
The framework we use for conversion rate optimization and A/B testing can actually help to identify and prioritize high-impact, low-effort projects. We look for high-confidence areas of high usage/traffic or high revenue, where even a small lift would have impact.
Why test it when you can just release it?
By releasing as a test, you'll not only have a better sense of how this release impacted behavior, but also gain valuable insight by validating or invalidating your hypothesis. Plus, once the process is in place, it's almost as easy to deploy each change as an A/B test as it is to push live to all segments.
It's too much process.
Our hope with this workflow is that it becomes habitual because it's helpful—not bureaucratic overhead (we hate that too). Just like how there was a sea change in writing user stories instead of tasks, we believe that the exercise of writing a hypothesis helps both the process of creating and objectively evaluating the design solution. The rest are guardrails to help non-analytics experts draw insights from data, reach statistical significance, and to help interpret the results.
We've tried it before, but it didn't move the needle.
For every test that's hyped to be a $300M button test, there are just as many 41 Shades of Blue experiments that neither generates sales or unlocks customer insight. A test needs to be focused enough to generate learnings, but not so ambitious that it doesn't get done (or is a nightmare to analyze), and not so small that even conclusive results don't matter.
What testing can and can't accomplish
Another reason why A/B testing can sometimes seem ineffective is when it's being asked to carry too much strategic weight. A/B testing isn't a substitute for product strategy—the team still needs to choose which hill to climb, and A/B testing can be a beacon to help you climb toward that apex.
While you're in the thick of building and growing a product, it's not always easy to tell if you're climbing up to a local maxima or if there's a much larger global maximum next door that would require a pivot to your value proposition and a redesign. For this, I refer to the excellent Nielsen Norman Group article Radical Redesign or Incremental Change?
Realistically, most teams are balancing both visionary strategy questions and the day-to-day fundamentals of running a business. You're incrementally improving towards your current vision, with a healthy side of growth opportunity speculation.
To ensure we don't mix objectives and end up achieving neither, we created two different workflows.
"A failing company with a poor strategy cannot blindly experiment its way to success"
— Jim Manzi, from his book Uncontrolled
Strategy
To help forward-looking strategy, we'd turn our analytics focus toward macro consumer insights to expose larger (and more uncertain) opportunities, and design tests to help reduce risk in any drastic changes.
Optimization
To help optimize the current business—focused on quick wins that will deliver results.
hunt for the global maximum
forward-looking
optimize for local maxima
rooted in the present
optimize for local maxima
rooted in the present
Expose larger business opportunities or lessen uncertainty
Extract macro consumer insights from UX analysis and translate to tactics to drive behavior change. Or design tests to help validate dangerous assumptions when repositioning.
1 deep dive with exposed analysis
ROI-focused tactics and quick wins
Identify critical points of friction influencing key KPIs and translate hypotheses to high-impact test designs.
3–5 optimization test designs
↑
The focus of this article
The three principles we follow for optimization
Our goal was to create a framework so that A/B testing becomes an engine that reliably delivers real business value in the form of increased revenue or customer insights. It also needs to be light-weight enough to address all the earlier objections: not enough time, not enough impact to justify the time investment, and results are too often inconclusive or not widely applicable to be valuable.
We distilled our approach down to three principles:
Let's take a closer look at each of these.
1. Low effort
Keep things simple, both conceptually and executionally. One of the most common reasons teams fall off on testing is it ends up taking too much time. Even super enthusiastic teams are at risk of overextending — designing excellent, but complicated tests which are time-consuming to launch, and also difficult to analyze. It's almost always better to build and keep momentum by launching several smaller tests, than to design one giant test that becomes higher-pressure to deliver positive results.
Conceptually, this means that nothing is being drastically reevaluated and post-test analysis will also be more manageable.
Executionally, our goal is to fit design and engineering into one sprint. When designing, what's the leanest implementation to test this hypothesis?
2. High impact
There's always going to be interesting data everywhere. From a conversion rate optimization standpoint, this is a trap we worked hard to avoid.
One trap of interesting data is wasting time in the analysis stage. Whenever you open Google Analytics, there's always going to be an anomaly begging for investigation. It's dangerously addictive to go down each of these "interesting data" rabbit holes and try to solve each one as a puzzle — before you know it, hours have flown by and you don't have anything actionable.
Another trap is investing time implementing clever solutions that might have a large impact (even doubling or tripling conversions!), but on a feature that only reaches a tiny segment of visitors. Unless there's a valuable customer learning, we'd consider this a failure from a conversion rate optimization standpoint.
To avoid analysis paralysis and to identify high-impact potential tests, we followed these three criteria:
3. High confidence
Lastly, we optimized for hypotheses that we’re most confident in, rather than speculating on unknowns. This meant:
Test Design Format
Test Design Format
Opportunity discovery
Each test design is divided into four prompts to help guide your analysis. Starting with the objective data observed, your interpretation of the data and insight, the hypothesis, then finally your proposed test design.
Data & observations
Interpretation & insights
Hypothesis
Design to test
Test design template
This page charts all the details of the test that will be entered into a A/B testing tool like Optimizely or Google Optimize.
You'll need to make strategic decisions around which KPIs determine success, and think through what assumptions you may be baking into the test.
You'll also need to calculate sample size and run time to achieve statistical significance using a tool like Adobe's Calculator.
Measurement & Analysis
Before the test runs, complete a row to help yourself track the test and think through how you'll analyze the test results. Once results are in, where will you deep dive to unpack why this happened? What did you learn? How will you iterate? What is the next follow-on test based on your learnings here?
Development, Launch, and Analyze Schedule
To maintain momentum and ensure product, design, and engineering are aligned on scheduling, this planner helps you loosely sketch out a testing schedule so that no live tests overlap.
Each cell is a full week to avoid day-of-week effects influencing results (e.g. starting on a Monday and ending on a Friday may skew higher by avoiding the low-activity weekend). If a test requires more or less development, adjust the blocks as needed.
Development and live-testing/analysis can be done in tandem (the goal is to always have a live test to maintain momentum, and there's no need to develop, push, analyze, then develop again).
Although live tests should not overlap, mobile and desktop tests may optionally be live simultaneously if there's no crossover.
Test Design /
Example One
The company is a direct-to-consumer retailer that serves both a highly technical loyal customer base and growing a new segment of people who are curious to start this hobby and don't yet have the technical knowledge to make sense of a large catalogue of products.
Data & observations
Interpretation & insights
Hypothesis
Design to test
Test Design /
Example Two
The company is a direct-to-consumer retailer who recently began focusing on mobile experience after more traffic shifted to mobile. Some features added are unique to the mobile experience, such as this "product added" modal.
Data & observations
Interpretation & insights
This may indicate the size of modal and available links may be adding extra steps and friction instead of streamlining their shopping experience.
Shoppers may be adding more to basket to help meet minimum shipping requirement.
Hypothesis
Design to test
Test Design /
Example Three
The company is a direct-to-consumer retailer that sells a wide assortment of products in many categories.
Data & observations
Interpretation & insights
Hypothesis
Design to test
Control, keeping the PDP exactly as is.
Category-specific category tiles to help people who want to stay nearby.
Hypothesis:
People are landing on a product that’s close, but it may not be the exact product they’re looking for, or they may want to assess all of their options in the range before committing
Execution:
Use breadcrumb information architecture to inform categories displayed.
Quickly launch into the most popular or interesting sections of the site to inspire more browsing and capture/shape intent.
Hypothesis:
Customers may not have a specific intent and general hooks may inspire additional browsing or discovery. These customer may be more suggestible and not on a specific journey.
Execution:
Use general navigation categories.
In conclusion
We hope this structure and examples help you in your testing journey.
Questions? Ideas? Feedback? Let us know how you've been using this workbook with your company on Twitter: @menghe
Get the FREE Google Slides template
Get the FREE Google Slides template
Use our template to help design your A/B tests, organize your analysis, and schedule each release.
Use our template to help design your A/B tests, organize your analysis, and schedule each release.
Credits
Thank you Lea Masatsugu, Maya Bhat, and Dana Lee for all you've taught me about analytics and asking the right questions, and mostly for always finding time to partner with me — none of this would be possible without you.