To help you master the lingo and become a more effective marketer, we've assembled this comprehensive list of A/B testing terminologies. Whether you're a seasoned professional, new to the industry, or just curious about how cutting-edge marketers convert leads, this valuable resource will help you stay informed and up-to-date. If you come across a term we haven't covered, don't hesitate to leave a comment with the word and its definition.
A/A Testing: A/A testing is a method used in website optimization where the same webpage or other marketing material is tested against itself. It is mainly conducted to check if the testing tools are working properly and not erroneously providing false results. It helps ensure the accuracy and reliability of A/B testing data, by confirming that any differences or changes in performance are not due to the testing setup or system errors.

A/B testing: A/B testing or split testing is a method of comparing two versions of a web page or other user experience to determine which one performs better. It's a way to test changes to your webpage against the current design and determine which one produces better results. It's done by showing the two variants, A and B, to two similar visitor groups and comparing the engagement or conversion rate to determine which version is more effective.
Above the fold refers to the portion of a webpage that is immediately visible in the browser viewport when the page first loads, without any scrolling required. The term originates from newspaper publishing, where the most important content appeared on the top half of the folded front page.
The exact dimensions of above-the-fold content vary based on device type, screen size, and browser configuration, but typically refers to approximately the first 600-1000 pixels of vertical content on desktop and 400-700 pixels on mobile. This prime digital real estate receives the most attention and engagement from visitors, as many users never scroll below it.
Above-the-fold content is critical in A/B testing because it has the greatest impact on first impressions, engagement, and conversion rates. Changes to this area typically produce larger effect sizes than below-the-fold modifications. However, testing above-the-fold elements requires careful attention to flickering issues, as visual changes in this area are most noticeable and can significantly impact user experience and test validity.
An e-commerce site tests moving their value proposition from below the fold to above the fold in the hero section. The test shows a 15% increase in add-to-cart rate, demonstrating how critical above-the-fold placement is for key messaging that influences purchase decisions before users scroll.
Adobe Commerce (formerly Magento) is an enterprise-grade, open-source e-commerce platform offering extensive customization and scalability for large, complex online retail operations.
As part of Adobe's Experience Cloud, Adobe Commerce provides both self-hosted and cloud-hosted options with advanced features for B2B and B2C commerce, multi-store management, and complex catalog requirements. The platform is built for businesses requiring high levels of customization, with robust APIs and a flexible architecture that supports sophisticated business logic. Adobe Commerce typically serves mid-market to enterprise retailers with significant technical resources.
The platform's complexity and flexibility make it ideal for sophisticated A/B testing and personalization strategies, but require specialized technical knowledge for implementation. Adobe Commerce's integration with Adobe's suite of marketing and analytics tools enables comprehensive optimization programs linking testing with customer data and targeting. Understanding this platform is crucial for CRO professionals working with enterprise retailers who need advanced experimentation capabilities.
An enterprise retailer uses Adobe Commerce's integration with Adobe Target to run multivariate tests across product pages, simultaneously testing images, descriptions, and pricing displays while segmenting results by customer lifetime value from Adobe Analytics.
Alpha is the significance level threshold used in hypothesis testing that represents the probability of making a Type I Error, or the acceptable risk of detecting a false positive result.
Commonly set at 0.05 (5%) in A/B testing, alpha defines how much evidence you require before declaring a test result statistically significant. When you set alpha to 0.05, you're stating that you're willing to accept a 5% chance of concluding there's a difference when none actually exists. Lower alpha values (like 0.01) make you more conservative, reducing false positives but requiring stronger evidence to detect true effects.
Choosing the right alpha level balances your risk tolerance with the ability to detect genuine improvements, directly impacting how you interpret test results and make business decisions. A more stringent alpha (lower value) protects against false positives but requires larger sample sizes and longer test durations. Most A/B testing practitioners use alpha = 0.05 as the industry standard, though high-stakes decisions may warrant more conservative thresholds.
You set alpha at 0.05 for a pricing test, meaning you'll only declare the new price successful if there's less than a 5% probability the observed improvement happened by chance. If your p-value is 0.03, you reject the null hypothesis and implement the change.
Alternative Hypothesis is the statement in hypothesis testing that proposes there is a real, measurable difference between the control and treatment variations in an A/B test.
Denoted as H₁ or Hₐ, the alternative hypothesis is what you're trying to find evidence for in your experiment. It directly opposes the null hypothesis and represents the claim that your variation causes a change in the metric you're measuring. Alternative hypotheses can be one-tailed (directional, predicting improvement or decline) or two-tailed (non-directional, simply predicting a difference). Most A/B tests use one-tailed alternatives because you're specifically testing whether a variation performs better.
Clearly defining your alternative hypothesis before running a test ensures you're measuring the right metrics and sets the foundation for proper statistical analysis. It helps determine your required sample size, informs whether you should use a one-tailed or two-tailed test, and guides the interpretation of results. A well-formulated alternative hypothesis includes the specific metric, direction of change, and ideally the minimum detectable effect you care about.
For a button color test, your alternative hypothesis might state: 'Changing the CTA button from blue to red will increase the click-through rate by at least 10% compared to the control.' This stands in contrast to your null hypothesis that there's no difference between the two colors.
Analysis: Analysis in marketing refers to the process of examining and interpreting data or information to guide business decisions. It involves gathering data from various sources, such as sales figures, customer feedback, and market trends, and then using that data to evaluate the effectiveness of your marketing strategies, identify opportunities for improvement, and make informed decisions about future marketing efforts. Analysis can be basic, such as looking at click-through rates, or more complex, like customer segmentation or predictive modeling.
An anti-flickering script is a code snippet that temporarily hides page content while an A/B testing tool loads and applies variations, preventing visitors from seeing the original content before it changes to the test variation. It eliminates the visual flash that occurs during variation rendering.
Also called a flicker-fix or anti-flicker snippet, this script typically hides the body element or specific page sections using CSS, then reveals content once the testing tool has determined which variation to show and applied the necessary changes. The hiding duration is usually capped with a timeout (typically 3-4 seconds) to prevent pages from remaining hidden if the testing script fails to load.
Anti-flickering scripts are crucial for maintaining a professional user experience and ensuring test validity. Without them, visitors may see jarring content shifts that reduce trust and engagement, potentially skewing test results. However, they must be implemented carefully as they can negatively impact page load performance and SEO if content remains hidden too long.
A retail site testing different hero images might use an anti-flickering script to hide the banner area for 500 milliseconds while the A/B testing tool determines which image to display. Without this script, visitors would briefly see the original image before it flashes to the test variation, creating a poor experience.
Asynchronous loading is a technique where web page elements, scripts, or resources load independently without blocking the rendering of other page content. Scripts marked as asynchronous download in parallel with page parsing and execute as soon as they're available, without waiting for or delaying other resources.
In contrast to synchronous loading, where each resource must fully load before the next begins, asynchronous loading allows multiple resources to download simultaneously. For JavaScript files, the async attribute tells browsers to continue parsing HTML while the script downloads, then execute the script immediately upon completion. This prevents long-running scripts from blocking page rendering and improves perceived performance.
Asynchronous loading is critical for A/B testing implementations because most testing tools load asynchronously to avoid blocking page rendering. While this improves overall page performance, it creates the potential for flickering since page content may render before test variations are applied. Understanding this trade-off helps optimize test implementations to balance performance with experience quality.
An A/B testing platform loads its JavaScript library asynchronously, allowing the page to render immediately while the testing code downloads. The page displays in 1.5 seconds instead of 3 seconds with synchronous loading, but requires an anti-flickering script to prevent users from seeing content changes when the test code executes after the initial render.
Average Revenue per User (ARPU): Average Revenue per User (ARPU) is a performance metric that illustrates the average revenue generated from each user or customer of your service or product within a specific time frame. It is calculated by dividing the total revenue made from customers or users by the total number of users within that time period. ARPU is used to analyze the growth trends of revenue and customer engagements.
Baseline: A baseline in marketing is the starting point by which you measure change or improvement in a campaign or strategy. It's a reference point that allows you to compare past performance to current performance after implementing new changes or strategies. By establishing a baseline, you can measure the effectiveness of your marketing efforts and identify areas for improvement. For example, if your baseline Email Clickthrough Rate is 10%, and after implementing a new strategy, it increases to 15%, you can say that the strategy resulted in a 5% improvement.
Bayes theorem is a mathematical formula that describes how to update the probability of a hypothesis based on new evidence, forming the foundation of Bayesian A/B testing by combining prior beliefs with observed data to produce posterior probabilities.
The theorem states that the posterior probability is proportional to the likelihood of the observed data given the hypothesis multiplied by the prior probability of the hypothesis. In A/B testing, this allows experimenters to continuously update their beliefs about which variation performs better as data accumulates. Bayes theorem provides a principled framework for learning from data while accounting for existing knowledge and uncertainty.
Bayes theorem enables a more flexible and intuitive approach to A/B testing compared to traditional frequentist methods. It allows for continuous monitoring without multiple testing penalties, earlier decision-making based on probability statements, and natural incorporation of prior knowledge. Understanding Bayes theorem is essential for implementing and interpreting Bayesian experimentation frameworks.
Using Bayes theorem, you start with a prior belief about your email signup form's 15% conversion rate, then as 1,000 test visitors interact with a new variation, you continuously update your probability estimate that the new form is better, refining your posterior distribution with each new data point collected.
Bayesian Statistics is a statistical approach that treats probability as a degree of belief and continuously updates the probability of a hypothesis being true as new data is collected during an A/B test.
Unlike frequentist methods, Bayesian approaches incorporate prior knowledge or beliefs into the analysis and express results as probability distributions rather than binary significant/not-significant outcomes. Bayesian A/B testing provides statements like 'the probability that Variant B is better than Control is 94%' and allows you to stop tests early or peek at results without inflating error rates. This approach uses credible intervals instead of confidence intervals and calculates the expected loss of choosing each variation.
Bayesian methods offer more intuitive interpretations of test results, making it easier for stakeholders to understand the probability of success and potential risk. They're particularly valuable for businesses that need to make faster decisions, can't wait for fixed sample sizes, or want to incorporate domain expertise into the analysis. However, Bayesian approaches require careful selection of priors and more complex calculations than traditional frequentist methods.
Your Bayesian A/B test shows there's an 87% probability that the new checkout flow is better than the current one, with an expected lift of 6-12%. Even though this hasn't reached 95% certainty, you decide to implement it because the potential upside outweighs the minimal expected loss.
Below the fold is derived from the print newspaper terminology where the most important stories were placed "above the fold" to grab the attention of potential buyers. Similarly, in the digital space, below the fold refers to the portion of a webpage that is not immediately visible when the page loads, and the user must scroll down to see it.
Below the fold is important because it determines how users engage with a website. It can affect a user’s first impression of your site and whether they decide to stay or leave. Additionally, it has an impact on ad placement and revenue, as ads above the fold tend to have higher visibility and click-through rates.
This region is not measured strictly in pixels, as screen sizes and resolutions vary between devices and users. Instead, it is typically considered as the percentage of the total page height from the top.
With the prevalence of mobile devices, considering below the fold content becomes tricky. Mobile screen sizes are significantly smaller, and users are more accustomed to scrolling on their devices. Hence, an effective design might be to encourage scrolling with engaging content rather than cramming everything above the fold.
Analyzing website usage can give insights into how users interact with your website. Through tracking tools like Google Analytics, one can understand how far users scroll, what they click on, and how long they spend on your page. This data can then be used to tailor your website design and placement of key elements effectively.
Benchmarking: Benchmarking is the process of comparing your business processes or performance metrics against the industry's best practices or standards. It aims to identify gaps, improve on operations, and track performance. This helps companies to understand where they stand in the market and strategize on how to become more competitive.
Beta is the probability of making a Type II Error in hypothesis testing, representing the risk of failing to detect a true difference between variations when one actually exists.
Beta is inversely related to statistical power, where power = 1 - β. If you set beta at 0.20 (20%), your test has 80% power, meaning an 80% chance of detecting a real effect if it exists. Beta is determined by your sample size, the minimum detectable effect you want to identify, and your alpha level. Most A/B testing best practices recommend aiming for beta ≤ 0.20 (power ≥ 80%).
Understanding and controlling beta helps you design tests with adequate statistical power to detect meaningful improvements, preventing you from abandoning genuinely better variations due to inconclusive results. Reducing beta requires increasing sample size, which directly impacts test duration and resource allocation. Power analysis using beta calculations should be performed before launching tests to ensure you collect enough data to reach reliable conclusions.
You conduct a power analysis showing you need 50,000 visitors per variation to achieve beta = 0.20 (80% power) for detecting a 5% lift. Running the test with only 10,000 visitors would increase beta significantly, risking that you'll miss a real improvement.
BigCommerce is a SaaS e-commerce platform that provides enterprise-level features and flexibility for mid-market to large online retailers without transaction fees.
Positioned as a more robust alternative to entry-level platforms, BigCommerce offers built-in features like advanced SEO, multi-channel selling, and no transaction fees on any plan. The platform provides greater customization flexibility than competitors through its Stencil theme engine and open API architecture. BigCommerce targets growing businesses that need enterprise capabilities without the complexity of fully custom solutions.
For optimization professionals, BigCommerce's built-in features and API flexibility offer more native testing capabilities compared to some competitors, reducing reliance on third-party apps. The platform's focus on performance and SEO makes it important to ensure A/B tests don't negatively impact these areas. Understanding BigCommerce's architecture helps CRO specialists leverage platform-specific features while implementing comprehensive testing programs.
An optimization team uses BigCommerce's native Google Analytics integration and open API to implement server-side A/B tests on product recommendation algorithms, testing which personalization approach drives higher average order values without impacting page load speed.
Bounce rate: Bounce rate is a metric that represents the percentage of visitors who enter your website and then leave ("bounce") without viewing any other pages or taking any further action. It essentially means they have not interacted more deeply with the site. This metric is often used as an indicator of the quality or relevance of a page's content or user experience. The lower the bounce rate, the better, as it suggests that visitors are finding the page engaging and are more likely to explore other areas of your website.
A CDN (Content Delivery Network) is a geographically distributed network of servers that cache and deliver website content from locations closest to end users, reducing latency and improving page load speeds. It stores copies of static assets like images, CSS, JavaScript, and videos across multiple data centers worldwide.
CDNs work by routing user requests to the nearest edge server rather than the origin server, significantly reducing the physical distance data must travel. When a user requests a webpage, the CDN serves cached content from a nearby location while only retrieving uncached or dynamic content from the origin server. Modern CDNs also provide additional services like DDoS protection, SSL/TLS encryption, and real-time analytics.
For A/B testing and CRO, CDNs are essential for maintaining fast page load speeds, which directly correlate with conversion rates and user engagement. Many enterprise A/B testing platforms operate at the CDN edge layer, enabling faster variation delivery and reduced flickering. However, CDN caching can complicate testing by serving stale content or causing inconsistent experiences if cache invalidation isn't properly managed.
An international e-commerce site uses a CDN to serve A/B test variations globally. Customers in Australia load the page in 1.2 seconds from a Sydney edge server, while European customers load from Amsterdam in 0.9 seconds, compared to 4-5 seconds if everyone accessed the origin server in Virginia. This speed improvement increases conversions by 8% across all markets.
Call-to-Action (CTA): A Call-to-Action (CTA) is a prompt on a website that tells the user to take some specified action. This can be in the form of a button, link, or image designed to encourage the user to click and continue down a conversion funnel. A CTA might be something like 'Buy Now', 'Sign Up', 'Download' or 'Learn More', aiming to persuade the user to move further into a marketing or sales cycle.
Chance to win: This term is typically used in the context of promotional campaigns or contests. It represents the probability or likelihood that a participant will win. It's calculated by dividing the total number of prizes by the total number of participants. This ratio provides an overview of how easy or difficult it might be for someone to win the contest or sweepstakes. In marketing, understanding these odds can help design more effective promotional strategies.
A chi-square test is a statistical method used to determine whether there is a significant association between categorical variables, most commonly applied in A/B testing to compare conversion rates or other binary outcome metrics between variations.
The chi-square test compares observed frequencies (actual conversions and non-conversions in each variation) against expected frequencies (what would occur if there were no difference between variations). It produces a test statistic and p-value that indicate whether the observed pattern of results is likely due to the test variation or random chance. This test is ideal for analyzing proportions, percentages, and count data.
Chi-square tests are the standard statistical method for evaluating A/B tests focused on conversion rates, click-through rates, and other percentage-based metrics. They provide a rigorous framework for decision-making about whether to implement changes based on binary outcomes. Most A/B testing platforms use chi-square tests or similar methods under the hood to calculate statistical significance for conversion metrics.
In testing two different call-to-action buttons, you observe 450 conversions from 10,000 visitors in variation A versus 520 conversions from 10,000 visitors in variation B. A chi-square test determines whether this difference in conversion rates (4.5% vs 5.2%) is statistically significant or could reasonably occur by chance.
Click Through Rate (CTR): The Click Through Rate (CTR) is a metric that measures the number of clicks advertisers receive on their ads per number of impressions. It is a critical measurement for understanding the efficiency and effectiveness of a specific marketing campaign or advertisement. It's calculated by dividing the total number of clicks by the number of impressions (views) and multiplying by 100 to get a percentage. This helps to understand how well your keywords, ads and landing pages are performing from a user engagement perspective.
Cohort: A cohort is a group of users who share a common characteristic or experience within a designated time period. In marketing, cohorts are often used for analyzing behaviors and trends or making comparisons among groups. For example, a cohort could be all users who signed up for a newsletter in a specific month or people who made a purchase within the first week of visiting a website. This tool is useful in A/B testing and helps in understanding the impact of different factors on user behavior.
Confidence interval: A confidence interval is a range of values, derived from a statistical calculation, that is likely to contain an unknown population parameter. In marketing, it is often used in A/B testing to determine if the variation of a test actually improves the result. The confidence interval gives us a defined range where we expect the true value to fall, based on our desired confidence level. If the interval is wide, it means our results may not be very reliable, whereas a narrow interval indicates a higher level of accuracy.
Confidence level: A confidence level refers to the statistical measure in an A/B test that provides an assurance or degree of certainty about the reliability of the result. For example, a 95% confidence level means that the likelihood of the observed difference between two versions has a 95% chance of being accurate, and is not due to random chance. Higher confidence levels reduce the probability of false positives in experiments.
Confounding variables are external factors that influence both the independent variable (the change being tested) and the dependent variable (the metric being measured), creating a false or misleading association between them.
In A/B testing, confounding variables can corrupt test results by introducing bias that makes it appear one variation is performing better or worse than it actually is. These variables are not part of the intended test design but affect outcomes nonetheless. Common confounding variables include seasonality, traffic source changes, browser updates, or marketing campaigns running simultaneously with the test.
Uncontrolled confounding variables can lead to incorrect conclusions and poor business decisions based on flawed test results. Proper randomization and controlled testing environments help minimize their impact. Identifying and accounting for potential confounders is essential for ensuring test validity and making reliable optimization decisions.
If you launch an A/B test on the same day your company starts a major TV advertising campaign, the increased traffic and brand awareness from the ads could be a confounding variable, making it impossible to determine whether conversion rate improvements came from your test variation or from the advertising boost.
Control: In the context of A/B testing and marketing, a control is the original, unchanged version of a webpage, email, or other piece of marketing content that is used as a benchmark to compare against a modified version, known as the variant. The performance of the control versus the variant helps determine whether the changes lead to improved results, like higher clickthrough rates, conversions, or other goals.
Control Group: A Control Group refers to a set of users in an A/B test who are exposed to the existing or 'control' version of your website, product, or marketing campaign. This group is used to compare the behavior and performance against those who experienced the new or ‘test’ version. It's a necessary component for A/B testing as it helps to establish a baseline and measure the impact of any changes.
Conversion rate: The conversion rate is the percentage of users who take a desired action on your website or in your marketing campaign. It's calculated by dividing the number of conversions by the total number of visitors. For example, if your web page had 50 conversions from 1,000 visitors, then your conversion rate would be 5%. Depending on your goal, a conversion could be anything from a completed purchase, a sign-up to a newsletter, or downloading a resource.
Cookie: A Cookie is a small piece of data stored on a user's computer by the web browser while browsing a website. These cookies help websites remember information about the user's visit, like preferred language and other settings, thus providing a smoother and more personalized browsing experience. They also play a crucial role in user-tracking, helping in website analytics, and personalizing advertisements.
Correlation: In marketing, correlation is a statistical measurement that describes the relationship between two variables. It is used to understand the influence of one variable on another. A positive correlation means that both variables move in the same direction, a negative correlation means they move in opposite directions. Correlation helps marketers analyze data and predict future trends or behaviors. However, it’s important to remember the principle that correlation does not imply causation - just because two variables correlate does not mean that one directly causes the other to occur.
Covariance: Covariance is a statistical measure that helps you understand how two different variables move together. It's used to gauge the linear relationship between these variables. A positive covariance means the variables move in the same direction, while a negative covariance indicates they move in opposite directions. If the covariance is zero, it suggests there is no linear relationship between the variables. This concept is widely used in risk management, portfolio theory and A/B testing to understand the impact of changes on different variables.
A credible interval is a range of values within which a parameter (such as conversion rate or effect size) lies with a specified probability in Bayesian analysis, representing the uncertainty around an estimate after observing data.
Unlike frequentist confidence intervals, credible intervals can be directly interpreted as probability statements about the parameter of interest. A 95% credible interval means there's a 95% probability that the true value falls within that range, given the data and prior beliefs. Credible intervals are derived from the posterior distribution and naturally incorporate all sources of uncertainty in the analysis.
Credible intervals provide an intuitive way to communicate uncertainty and effect sizes to stakeholders, avoiding the common misinterpretations associated with confidence intervals. They enable better risk assessment by clearly showing the range of plausible outcomes. When credible intervals for the difference between variations exclude zero, this provides strong evidence that one variation outperforms the other.
Your Bayesian A/B test analysis shows the new landing page has a conversion rate with a 95% credible interval of 4.2% to 5.8%, meaning you can be 95% certain the true conversion rate lies within this range, compared to the control's credible interval of 3.1% to 4.3%.
DebugBear is a website performance monitoring and optimization tool that provides continuous tracking of Core Web Vitals, page speed metrics, and detailed performance analysis for web pages.
DebugBear specializes in monitoring performance over time with scheduled tests from multiple locations, offering detailed waterfall charts, filmstrip views, and performance budgets. The platform is particularly useful for tracking the performance impact of A/B tests and website changes, providing alerts when metrics degrade. It integrates with CI/CD pipelines and offers comparative analysis to understand performance trends.
For A/B testing and CRO practitioners, DebugBear helps ensure that optimization experiments don't inadvertently harm page performance and Core Web Vitals scores. The tool can identify when testing platforms or specific variants introduce performance regressions that might suppress conversion rates. Using DebugBear alongside A/B testing platforms enables teams to separate performance-related conversion impacts from the actual design or messaging changes being tested.
A CRO team uses DebugBear to monitor their homepage A/B test and discovers that Variant C, despite showing promising engagement metrics, has increased LCP from 2.1s to 3.8s due to an unoptimized hero image, explaining why the variant's conversion rate remained flat despite higher scroll depth.
Drag-and-drop Technology is a user interface feature in A/B testing and website building tools that allows users to visually move, add, or modify elements on a webpage without writing code.
This technology democratizes website experimentation by providing visual editors where users can click on page elements and make changes through intuitive interfaces rather than editing HTML, CSS, or JavaScript directly. Most modern A/B testing platforms include drag-and-drop editors for creating test variations, allowing marketers to build experiments independently. While convenient, these tools sometimes generate inefficient code or have limitations with complex page structures.
Drag-and-drop interfaces significantly reduce the technical barriers to running A/B tests, enabling marketing teams to launch experiments without developer resources for every variation. This acceleration of testing velocity allows organizations to run more experiments and iterate faster on optimization insights. However, CRO professionals must balance convenience with performance considerations, as visual editors can sometimes introduce page flicker or slower load times compared to native implementations.
A marketing manager uses the drag-and-drop editor in VWO to create a test variation by moving the testimonials section above the product features on a landing page, launching the experiment in 15 minutes without submitting a developer ticket.
An Ecommerce Platform is software that enables businesses to build, manage, and operate online stores, providing essential functionality for product display, transactions, and order management.
These platforms range from hosted SaaS solutions to open-source systems, offering varying levels of customization, scalability, and built-in features. Common capabilities include product catalog management, shopping cart functionality, payment processing, inventory tracking, and customer management. The choice of platform significantly impacts a business's ability to scale, customize, and optimize the shopping experience.
The ecommerce platform directly influences what A/B testing and CRO strategies are possible, as each platform has unique capabilities, limitations, and integration requirements. Understanding platform-specific constraints around checkout customization, page speed, and third-party integrations is essential for developing effective optimization roadmaps. Different platforms require different testing tool implementations and may restrict experimentation on certain pages or elements.
A CRO consultant develops different testing strategies for two clients based on their platforms: implementing advanced checkout tests for a client on Adobe Commerce with full customization access, while focusing on pre-checkout optimizations for a client on basic Shopify due to checkout restrictions.
Effect Size: Effect size refers to the magnitude or intensity of a statistical phenomenon or experiment result. In simpler terms, it measures how big of an effect a certain factor or variable has in a study or test. It provides context for statistical significance and can help you to understand the practical significance, or real-world impact, of your findings. In marketing, it may describe the extent to which a particular marketing campaign or strategy has impacted sales, customer engagement, or any other target metric.
Engagement rate: Engagement rate is a metric used in digital marketing to measure the level of interaction or engagement that a piece of content receives from an audience. It includes actions like likes, shares, comments, clicks etc., relative to the number of people who see or are given the opportunity to interact with your content. It helps brands understand how their content is resonating with their audience and whether it's leading to meaningful interaction.
Entry Page: An Entry Page is the first page that a visitor lands on when they come to your website from an external source, such as a search engine, social media link, or another website. It acts as the first impression of your website for many visitors. It's important that these pages are optimized, engaging, and easy to navigate to ensure user satisfaction and promote further interaction with your site.
Error Rate: The error rate is the percentage of errors that occur in a certain process or action, often in reference to online activities or technical processes. In a marketing context, it might refer to the percentage of failed or incorrect actions such as unsuccessful page loads or incomplete transactions. It's important to monitor and minimize error rates to improve user experience and data integrity.
Exit Intent: Exit intent is a technology used in digital marketing to detect when a site visitor is about to leave the website or page. It usually triggers a pop-up or special message attempting to convince the user to stay on the page or take some action like signing up for a newsletter, purchasing a product, or downloading a resource. It's a proactive way to reduce bounce rate and improve conversions.
Exit Page: An exit page refers to the last web page that a visitor views before they leave your website. It's where the visitor's session on your site ends. Analyzing exit pages can provide useful insights for understanding why users leave your website from these specific pages, which can inform strategies to improve user experience or increasing conversion rates.
Exit Rate: The exit rate is the percentage of visitors who leave your website from a specific page. This metric is used to identify which pages are the final destination before a visitor leaves, indicating possible issues with those pages. It's calculated by dividing the total number of exits from a page by the total number of visits to that page. Unlike bounce rate, exit rate also considers visitors who might have navigated to different pages on your website before leaving.
Expected loss is the average amount of value (revenue, conversions, or other metrics) you would lose by choosing a particular variation if it turns out to be inferior, calculated by integrating the loss function over the posterior probability distribution.
Expected loss represents the risk associated with each possible decision in an A/B test, weighted by the probability of each outcome. It's calculated separately for the decision to implement each variation, accounting for all scenarios in which that choice could be wrong and their associated costs. When the expected loss of choosing the best-performing variation becomes acceptably small, you have sufficient evidence to conclude the test.
Expected loss provides a practical, business-oriented stopping criterion for A/B tests that's more meaningful than p-values or confidence levels. It directly answers the question 'how much could we lose by making this decision now?' enabling teams to balance the cost of uncertainty against the cost of delayed implementation. Using expected loss thresholds aligned with business tolerance for risk leads to more efficient testing and better ROI from experimentation programs.
Your test shows variation B leading with a 2.5% conversion rate versus control's 2.3%, but the expected loss of choosing B is still $3,500 per week, exceeding your $1,000 risk threshold. You continue the test until more data reduces the expected loss to an acceptable level before implementing the change.
Experience optimization, often abbreviated as EXO, refers to the use of various techniques, tools, and methodologies to improve the user experience during interactions with a product, system, or service. This could be an online experience, such as website navigation or mobile app use, or offline experiences such as customer service or sales interactions.
EXO matters for several reasons.
- Better experiences lead to customer loyalty, and loyal customers are more likely to recommend your products/services to others.
- Optimizing experiences can boost metrics like customer engagement and conversion rates, leading to increased revenue.
Experience optimization is typically accomplished through a combination of methods.
Some of these methods include A/B testing - comparing two versions of a web page or other user experience to see which performs better, multivariate testing - testing multiple variables to see which combination works best, user research to gain insights into user behaviors and preferences, and using analytics to understand usage patterns and trends.
Digital experience platforms can also help deliver diverse and personalized experiences to different customer segments.
The process of digital experience optimization often involves trying new things, challenging assumptions, questioning the so-called "common wisdom," and continuously exploring new ideas based on actual data. This empowers the team to make data-driven decisions rather than just relying on intuition or guesswork.
Experience optimization should be an ongoing process rather than a one-time effort, as technology continues to advance and customer expectations continually evolve, the experiences you offer must also keep pace.
An iterative approach to optimization allows you to continuously learn, adapt, and improve, leading to valuable discoveries and incremental improvements over time. This sort of continuous optimization is what ultimately leads to superior customer experience and business performance.
False Negative: A false negative is a result that appears negative when it should not be. In marketing terms, a false negative could be when a test fails to identify a potential improvement or success in a campaign, ad or email. This could prevent potential progress or advancement in marketing efforts, as it might indicate that a strategy isn't working when it actually is.
False Positive: A false positive in marketing terms refers to a result that incorrectly indicates that a particular condition or attribute is present. For instance, in A/B testing, a false positive could occur when a test indicates that a new webpage design is significantly better at driving conversions when it is not really. It typically happens due to errors in data collection, testing procedures or statistical anomalies.
Flickering is the brief visual flash or content shift that occurs when a page initially loads with original content and then visibly changes to display an A/B test variation after the testing script executes. It creates a jarring user experience where visitors see the page transform before their eyes.
Flickering happens because most client-side A/B testing tools operate asynchronously—the page begins rendering before the testing script determines which variation to show and applies the changes. This delay, often just milliseconds, is enough for users to perceive the original content before it switches. Flickering is most noticeable with above-the-fold changes and on slower connections.
Flickering damages user experience, reduces trust, and can significantly bias test results by causing visitors to bounce or behave differently than they would with a smooth page load. It's particularly problematic for tests involving major visual changes and can make variation pages appear slower or lower quality than control. Minimizing flickering is essential for both valid results and maintaining brand perception.
An e-commerce site testing a new navigation menu experiences flickering when visitors briefly see the old menu structure for half a second before it suddenly transforms into the new layout. This visual glitch confuses users and may cause them to click before the layout stabilizes, creating invalid interaction data.
Frequentist Statistics is the traditional statistical approach used in A/B testing that determines whether results are significant by calculating the probability of observing the data (or more extreme data) if the null hypothesis were true.
This approach treats probability as the long-run frequency of events and relies on p-values, confidence intervals, and fixed significance thresholds (alpha) to make decisions. Frequentist methods require pre-determined sample sizes and don't incorporate prior beliefs into the analysis. The methodology assumes that with infinite repetitions of an experiment, the true effect would be captured within the confidence interval a certain percentage of the time.
Frequentist statistics remains the most widely used and accepted approach in A/B testing, providing a standardized framework that's well-understood across industries and regulatory bodies. It offers strong protection against false positives when proper procedures are followed, including avoiding peeking at results before reaching the predetermined sample size. Understanding frequentist methods is essential for designing rigorous tests, interpreting results from most A/B testing platforms, and communicating findings credibly.
You design a frequentist A/B test requiring 40,000 visitors per variation, set alpha at 0.05, and commit to not analyzing results until reaching that sample size. After collecting the data, you calculate a p-value of 0.02, leading you to reject the null hypothesis and conclude the treatment significantly outperformed the control.
Funnel: A funnel in marketing refers to the journey that a potential customer takes from their first interaction with your brand to the ultimate goal of conversion. It's often described as a funnel because many people will become aware of your business or product (the widest part of the funnel), but only a portion of those will move further down the funnel to consider your offering, and even fewer will proceed to the final step of making a purchase (the narrowest part of the funnel). It's crucial for businesses to study and optimize this process to increase conversion rates.
Geolocalization: Geolocalization is the process of determining or estimating the real-world geographic location of an internet connected device, such as a computer, mobile phone, or server. This location information, usually given in terms of latitude and longitude coordinates, can be used for a variety of purposes, such as delivering tailored advertising or content, improving location-based search results, and even for security or fraud prevention measures.
HTTP Requests are individual calls made by a web browser to a server to fetch resources like HTML files, stylesheets, scripts, images, fonts, and other assets needed to render a webpage.
Each HTTP request involves network overhead including DNS lookup, connection establishment, and data transfer, making the total number of requests a key performance indicator. Modern web pages can generate dozens to hundreds of requests, though HTTP/2 and HTTP/3 have reduced the performance penalty of multiple requests through multiplexing. Reducing unnecessary requests remains a fundamental web optimization strategy.
When A/B testing, additional HTTP requests from testing tools, tracking pixels, or variant-specific resources can significantly impact page load performance and skew test results. Each test variant should be monitored for request count to ensure that performance differences don't confound the measurement of the actual design or copy changes being tested. High request counts particularly affect users on slower networks or mobile connections, potentially creating a bias against feature-rich variants.
A checkout page test shows Variant B underperforming by 8% in conversions; investigation reveals it makes 23 additional HTTP requests for customer testimonial images and third-party trust badges, increasing load time by 1.4 seconds and causing users to abandon before seeing the content.
Heatmapping: is a data visualization tool that shows where users have clicked, scrolled, or moved their mouse on your website. It uses colors to represent different levels of activity - warm colors like red and orange signify areas where users interact the most, while cool colors like blue signify less interaction. They help to analyze how effective your webpage is, showing you what parts of your page are getting the most attention and where users are ignoring, thus providing insights on how to improve user experience and conversion rates.
Hypothesis: A hypothesis in marketing terms is an assumed outcome or predicted result of a marketing campaign or strategy before it is implemented. It is a statement that forecasts the relationship between variables, such as how a change in a marketing approach (like altering a CTA button color) might affect conversions. A hypothesis is typically based on research and data, and it's tested and validated through A/B testing or other forms of experimentation.
Hypothesis Test: A Hypothesis Test is a statistical method used in A/B testing where you test the validity of a claim or idea about a population parameter. In the context of A/B testing, it's a way to prove or disprove the assumption that a particular change (like a new webpage design or marketing strategy) will increase conversions or other key metrics. The objective of a hypothesis test is to determine which outcome— the original version (A) or the new version (B)— is more effective.
LCP (Largest Contentful Paint) is a Core Web Vital metric that measures how long it takes for the largest visible content element on a page to fully render from when the user first navigates to the URL.
This user-centric performance metric focuses on perceived load speed by tracking when the main content becomes visible, typically the largest image, video, or text block above the fold. Google considers LCP under 2.5 seconds as good, 2.5-4 seconds as needs improvement, and over 4 seconds as poor. LCP is one of three Core Web Vitals that directly impact Google search rankings and user experience quality.
A/B testing implementations can significantly impact LCP through additional JavaScript execution, delayed rendering, or content flicker, potentially harming both user experience and SEO performance. CRO professionals must monitor LCP when running experiments to ensure testing infrastructure doesn't slow page loads enough to reduce conversions or search visibility. Optimizing LCP itself can be a powerful conversion lever, as faster-loading pages typically see improved engagement and conversion rates.
After implementing a new A/B testing tool, an ecommerce site notices their product page LCP increased from 2.1 seconds to 3.8 seconds due to render-blocking test scripts, prompting them to switch to asynchronous loading to maintain both testing capabilities and page performance.
Landing Page Optimization: Landing Page Optimization refers to the process of improving or enhancing each element on your landing page to increase conversions. These elements may include the headline, call-to-action, images, or copy. The goal is to make each page as impactful and effective as possible, encouraging visitors to complete a certain action like signing up for a newsletter, making a purchase, or filling out a form. This is often achieved through A/B testing different versions of a page to see which performs better.
Largest Contentful Paint (LCP) is a Core Web Vitals metric that measures the time it takes for the largest visible content element (image, video, or text block) to render on the screen from when the page first starts loading.
LCP is part of Google's Core Web Vitals and focuses on perceived load speed from the user's perspective. A good LCP score is 2.5 seconds or less, while anything over 4 seconds is considered poor. The metric specifically tracks the largest element within the viewport, which typically represents when the main content has loaded.
LCP directly impacts user experience and SEO rankings, making it critical for A/B testing page speed optimizations. Poor LCP scores can lead to higher bounce rates and lower conversion rates, as users perceive slow-loading pages negatively. When running experiments that modify page layouts or content, monitoring LCP ensures that performance improvements don't come at the cost of user experience.
An e-commerce site testing two product page variants notices that Variant B, which uses larger hero images, has an LCP of 4.2 seconds compared to Variant A's 2.1 seconds, explaining why Variant B shows a 15% higher bounce rate despite having more visually appealing design.
The Law of Large Numbers is a statistical principle stating that as sample size increases, the observed average of results will converge toward the true expected value of the population.
In A/B testing, this theorem explains why larger sample sizes produce more reliable and accurate results that better represent actual user behavior. The law guarantees that random fluctuations and outliers have less impact on outcomes as more data is collected. This mathematical principle underpins the requirement for sufficient sample sizes before declaring test winners.
Understanding this law helps practitioners avoid premature test conclusions based on early data that may not represent true performance differences. It explains why tests need adequate traffic and time to reach statistical significance, preventing costly decisions based on misleading early trends. This principle is fundamental to determining appropriate sample sizes and test duration in experimental design.
In the first 100 visitors to an A/B test, Variation B shows a 50% conversion lift, but after 10,000 visitors, the Law of Large Numbers reveals the true effect is only a 5% improvement as early randomness evens out with larger sample size.
Long-run frequency is a frequentist interpretation of probability that defines the likelihood of an event as the proportion of times it would occur if an experiment were repeated infinitely under identical conditions. It represents the observed frequency of outcomes over many trials rather than a subjective belief.
This concept underlies frequentist statistical approaches commonly used in A/B testing, where probabilities are viewed as objective properties of repeatable experiments. For example, a 95% confidence interval means that if you ran the same test 100 times, approximately 95 of those intervals would contain the true parameter value. The interpretation strictly relates to repeated sampling, not to the probability of a single event.
Understanding long-run frequency is essential for correctly interpreting A/B test statistics like p-values and confidence intervals. It clarifies that statistical significance relates to what would happen across many repetitions of the experiment, not the probability that a specific result is true. This interpretation prevents common misunderstandings about what confidence levels actually mean in test results.
When an A/B testing tool reports a 95% confidence level, it means that if you repeated this exact test 100 times with different random samples, approximately 95 of those tests would correctly identify which variation is better. It does not mean there's a 95% probability that Variation B is the winner in this specific test.
A loss function quantifies the cost or negative consequence of making a wrong decision in A/B testing, typically measuring the expected loss in revenue, conversions, or other key metrics that would result from choosing an inferior variation.
In Bayesian A/B testing, loss functions formalize the business impact of decision errors by calculating the opportunity cost of selecting the worse variation. Different loss functions can reflect different business priorities, such as maximizing conversions, minimizing downside risk, or optimizing revenue. The expected loss for each decision (implementing A or B) is calculated across the entire posterior distribution.
Loss functions connect statistical analysis directly to business outcomes, enabling data-driven decisions based on actual costs rather than arbitrary significance thresholds. They help determine when to stop a test by quantifying whether the potential gain from continuing outweighs the certain cost of delayed implementation. Using loss functions allows teams to make economically rational decisions that balance statistical uncertainty against business impact.
In testing two pricing strategies, your loss function calculates that if you incorrectly choose the worse option, you'd lose an expected $15,000 per month in revenue. When the expected loss of choosing either variation drops below your $2,000 threshold, you have sufficient confidence to make a decision.
Metrics: Metrics are measurements or data points that track and quantify various aspects of marketing performance. These can include factors like click-through rates, conversion rates, bounce rates, and more. Metrics are used to assess the effectiveness of marketing campaigns, strategies, or tactics, allowing you to understand what's working well and what needs improvement in your marketing efforts.
The Minimum Detectable Effect (MDE) is a crucial concept in experiment design and A/B testing. It represents the smallest change in a metric that an experiment can reliably detect. Understanding the MDE is essential for effective hypothesis testing and ensuring your experiments have sufficient statistical power.
In simple terms, the MDE is the tiniest change or effect in a certain metric that your study or experiment can consistently identify. It's a key factor in determining the sample size needed for your experiment and plays a vital role in data analysis.
Let's break it down with an example:
Imagine you're running an A/B test to improve your website's sign-up rate. Your current sign-up rate (control variant) is 10%, and you want to test a new design (treatment variant). What's the smallest improvement you'd consider meaningful? This is where the MDE comes in.
The smaller the MDE, the more subtle changes your experiment can detect. However, detecting smaller effects typically requires larger sample sizes.
Statistical power is closely related to the MDE. It represents the probability of detecting a true effect when it exists. A power analysis helps determine the sample size required for your experiment to avoid Type II errors (false negatives).
Here's how MDE and statistical power work together:
To calculate the MDE, you need to consider several factors:
There are various online calculators and tools available to help you determine the MDE for your experiments.
Grasping the concept of MDE is crucial for several reasons:
The Minimum Detectable Effect is a fundamental concept in experimentation and A/B testing. By understanding and correctly applying MDE in your experiment design and data analysis, you can ensure that your tests are properly powered and capable of detecting meaningful changes. This knowledge will help you make more informed decisions and reduce the risk of false conclusions in your experimental efforts.
Remember, effective experimentation is about precise measurement and thoughtful analysis, not guesswork. By mastering concepts like MDE, you'll be better equipped to design and interpret experiments across various domains, from website optimization to product development.
Multi Arm Bandit: A multi-arm bandit is a statistical method used in marketing for testing multiple strategies, offers, or options concurrently to determine which one performs best. Similar to A/B testing, but instead of splitting the audience evenly among all options, a multi-arm bandit test dynamically adjusts the traffic allocation to each option based on their ongoing performance. It's named after a casino slot machine, where each "arm" is a different strategy or option and the "bandit" is the unpredictable reward. This method allows quicker, more efficient decision-making in comparison to traditional testing methods.
Multiple Testing is a statistical challenge that occurs when conducting multiple simultaneous hypothesis tests or comparisons, increasing the probability of finding false positive results purely by chance.
When running multiple A/B tests concurrently or comparing multiple variants against a control, the risk of Type I errors (false positives) compounds with each additional comparison. For example, using a 95% confidence level means a 5% chance of false positive per test; running 20 tests simultaneously gives approximately a 64% chance of at least one false positive. Statistical corrections like Bonferroni, Benjamini-Hochberg, or sequential testing methods help control for this inflation of error rates.
Multiple testing problems can lead optimization teams to implement changes based on spurious results, wasting development resources and potentially harming user experience or revenue. Without proper statistical corrections, the more experiments a team runs, the more likely they are to make incorrect decisions. Understanding and accounting for multiple testing is essential for maintaining the integrity of an experimentation program, especially for teams running dozens or hundreds of tests annually.
A marketing team runs 15 simultaneous A/B tests on different page elements, finding three 'statistically significant winners' at p<0.05. After applying Bonferroni correction to account for multiple testing, only one of the three remains significant, preventing them from implementing two changes that would have likely had no real impact or potentially harmed conversions.
Multivariate Analysis is a statistical technique used to analyze data that comes from more than one variable. This process allows marketers to understand how different variables (like design, color, location, etc.) interact together and impacts the final results or visitor behavior. It's often used in A/B testing when wanting to see the effect of multiple variations in a campaign all at once.
Multivariate Testing (MVT) is a process where multiple variables on a webpage are simultaneously tested to determine the best performing combinations and layouts. Unlike A/B testing that tests one change at a time, MVT allows you to test numerous changes and see how they interact with each other. The goal of MVT is to identify the most effective version of your webpage, considering all the different elements and their combinations. This could help improve a webpage's performance in terms of factors such as click-through rates, conversions, or any other key performance indicator.
Normalization: Normalization is a process used in data analysis to adjust the values measured on different scales to a common scale. This is often done in preparation for data comparison or statistical analysis, ensuring the results are accurate and meaningful. By normalizing data, one can remove any biases or anomalies that might disrupt the analysis. For example, normalizing sales data from different regions takes into account variations in population size, thereby allowing for a fair comparison.
Null Hypothesis: A null hypothesis is a statistical concept that assumes there is no significant difference or relation between certain aspects of a study or experiment. In other words, it's the hypothesis that your test is aiming to disprove. For example, in an A/B test, the null hypothesis might be that there's no difference in conversion rates between version A and version B of a webpage. If the test results show a significant difference, then you can reject the null hypothesis.
Null Hypothesis Significance Testing (NHST) is a statistical method used to determine whether observed differences between test variations are statistically significant or likely due to random chance. It involves testing a null hypothesis that assumes no difference exists between variations against an alternative hypothesis that a difference does exist.
In A/B testing, NHST begins with the assumption that both variants perform identically (the null hypothesis). Statistical tests calculate the probability (p-value) of observing the measured difference if the null hypothesis were true. If this probability falls below a predetermined threshold (typically 0.05), the null hypothesis is rejected, suggesting a statistically significant difference exists.
NHST provides the mathematical foundation for determining whether A/B test results are reliable or merely random fluctuations. It helps prevent false conclusions by quantifying the likelihood that observed performance differences are genuine, enabling data-driven decisions about which variation to implement. Without NHST, experimenters risk making costly changes based on statistical noise.
If Variation B shows a 3% conversion rate increase over Control A, NHST helps determine whether this 3% lift is a real improvement or could have occurred by chance. A p-value of 0.02 would indicate only a 2% probability the difference is random, supporting the conclusion that Variation B truly performs better.
One-Tailed Test: A One-Tailed Test is a statistical method used in hypothesis testing. It's a directional test that helps to determine if a set of data has a greater or lesser value than a specific value or point. The "one tail" in this test refers to testing the statistical probability in one direction or 'tail' of the distribution, instead of both. This test is often employed in business A/B testing scenarios or scientific research where one is only interested in whether a parameter is either greater or lesser than a baseline, not simply different.
Optimization in marketing terms refers to the process of making changes and adjustments to various components of a marketing campaign to improve its effectiveness and efficiency. These modifications may involve aspects such as website design, ad copy, SEO strategies or other marketing tactics. The goal is to improve the rate of conversions, maximize engagement, and achieve better results in relation to your business objectives.
P-hacking: P-hacking, also known as data dredging, is a method in which data is manipulated or selection criteria are modified until a desired statistical result, typically a statistically significant result, is achieved. It involves testing numerous hypotheses on a particular dataset until the data appears to support one. This practice can lead to misleading findings or exaggerated statistical significance. P-hacking is generally considered a problematic and unethical practice in data analysis.
P-value: A p-value in marketing A/B testing is a statistical measure that helps determine whether the difference in conversion rates between two versions of a page is statistically significant or just due to chance. It represents the probability that the differences observed occurred randomly. Typically, if the p-value is less than 0.05 (5%), it is considered statistically significant, indicating that it's highly unlikely the observed difference happened due to chance alone.
Personalization refers to the method of tailoring the content and experience of a website or marketing message based on the individual user's specific characteristics or behaviors. These may include location, browsing history, past purchases, and other personal preferences. The goal of personalization is to engage customers more effectively by delivering relevant and personalized content, improving their overall user experience.
Personalization Testing: This is a process of customizing the user experience on a website or app by offering content, recommendations, or features based on individual user’s behavior, preferences, or demographics. The purpose of personalization testing is to determine the most effective personalized experience that encourages a user to take desired action such as making a purchase, signing up for a newsletter or any other conversion goals. It often involves A/B testing different personalized elements to see which version performs best.
Population: In marketing, the population refers to the total group of people that a company or business is interested in reaching with their marketing efforts. This might be all potential customers, a specific geographic area, or a targeted demographic. It is this 'population' that marketing strategies and campaigns are created for, in order to effectively promote a product or service.
Posterior probability is the updated probability of a hypothesis being true after taking into account new evidence or data, calculated using Bayesian statistical methods by combining prior beliefs with observed experimental results.
In Bayesian A/B testing, the posterior probability represents your refined understanding of which variation is truly better after seeing the test data. It's calculated by updating your prior probability distribution with the likelihood of the observed data using Bayes' theorem. The posterior probability is typically expressed as the probability that variation B beats variation A, providing an intuitive measure for decision-making.
Posterior probabilities offer a more intuitive interpretation than traditional p-values, directly answering the question 'what's the probability that this variation is actually better?' This makes results easier to communicate to stakeholders and enables better decision-making under uncertainty. Posterior probabilities also allow you to incorporate prior knowledge and make decisions earlier by quantifying the risk of choosing the wrong variation.
After running a Bayesian A/B test for one week, your analysis shows a posterior probability of 94% that the new checkout flow is better than the current one, meaning there's a 94% chance it truly has a higher conversion rate based on the data observed and your prior assumptions.
Power of a Test: This term refers to the ability of a statistical test to detect a difference when one actually exists. It measures the test’s sensitivity or its capacity to correctly identify true effects. Depending on the context, true effects could mean distinguishing between two different marketing campaigns, product versions, or anything similar. A test with high power reduces the risk of committing a Type II error, which happens when the test fails to detect a true difference or effect.
Prior belief is the probability distribution representing your initial assumptions or existing knowledge about a parameter (such as conversion rate) before collecting new data from an experiment, serving as the starting point for Bayesian analysis.
In Bayesian A/B testing, prior beliefs formalize what you already know or assume about your metrics before the test begins, whether from historical data, domain expertise, or complete uncertainty. Priors can be informative (based on specific previous data) or uninformative (assuming little prior knowledge). As test data accumulates, the prior is combined with the likelihood of the observed data to produce the posterior distribution.
Properly specified priors allow you to incorporate existing knowledge into your analysis, potentially reaching reliable conclusions faster than starting from scratch. They make the assumptions underlying your analysis explicit and transparent. Using informative priors based on historical performance can improve estimation accuracy, especially early in a test when data is limited, leading to more efficient experimentation.
Before testing a new pricing page, you set a prior belief that the conversion rate will be around 3% with some uncertainty, based on six months of historical data showing the current page converts at 2.8-3.2%. This prior is then updated with data from the new test to calculate posterior probabilities.
Probability is a statistical term that measures the likelihood of an event happening. In marketing, it's used to predict outcomes such as the chance a visitor will click a link, buy a product, or engage with content. It ranges from 0 (the event will definitely not happen) to 1 (the event will definitely happen). Interpreting probability can help to make informed decisions and optimize marketing strategies.
Probability Distribution: A Probability Distribution is a mathematical function that provides the possibilities of occurrence of different possible outcomes in an experiment. In simple words, it shows the set of all possible outcomes of a certain event and how likely they are to occur. This could be represented in a graph, table, or equation that provides a probability (a number between 0 and 1) to each possible event. In marketing, a probability distribution might be used to predict sales outcomes or response rates.
Randomization: Randomization in marketing refers to the method of assigning participants in a test, such as an A/B test, to different groups without any specific pattern. It ensures that the test is fair and unbiased, and that any outcome differences between the groups can be attributed to the changes being tested, not some pre-existing factor or variable. It's a key component in running effective, reliable experiments in marketing.
Randomization bias occurs when the process of randomly assigning users to test variations is flawed or compromised, resulting in systematic differences between groups that can skew test results.
While randomization is meant to create equivalent groups for comparison, technical implementation errors or non-random factors can introduce bias. This can happen due to improper hashing algorithms, cookie deletion patterns, or users being assigned based on characteristics correlated with the outcome. True randomization should ensure each user has an equal probability of being assigned to any variation regardless of their attributes.
Randomization bias undermines the fundamental assumption of A/B testing that groups are comparable at baseline. When present, it becomes impossible to attribute differences in outcomes solely to the variations being tested rather than pre-existing group differences. Ensuring proper randomization is critical for generating valid, actionable insights from experiments.
If your testing tool assigns mobile users disproportionately to the control group and desktop users to the variation, you've introduced randomization bias because device type often correlates with conversion behavior, making it unclear whether results stem from your changes or device differences.
Regression Analysis is a statistical method used in marketing to understand the relationship between different variables. It helps predict how a change in one variable, often called the independent variable, can affect another variable, known as the dependent variable. For example, it could be used to see how changes in advertising spend (independent variable) might impact product sales (dependent variable). This technique is often used for forecasting, time trending, and determining cause and effect relationships.
Retention refers to the ability to keep or hold on to something, such as customers or users, over a certain period of time. In marketing, it's about the strategies and tactics businesses use to encourage customers to continue using their product or service, rather than switching to a competitor. High customer retention means customers tend to stick with your product or service, which often translates to customer loyalty and higher profits.
Return on Investment (ROI) is a performance measure that is used to evaluate the efficiency or profitability of an investment, or to compare the efficiency of different investments. It's calculated by dividing the profit from an investment (return) by the cost of that investment. The higher the ROI, the better the investment has performed. In marketing, ROI could mean the amount of revenue generated from a campaign compared to the cost of running that campaign.
Revenue Per Visitor (RPV) is a measure used in online business to determine the amount of money generated from each visitor to a website. It's calculated by dividing the total revenue by the total number of visitors. It's helpful in understanding the effectiveness of your website or marketing campaigns in generating revenue.
STTV is the acronym for Start Time To Variant, representing the duration between page load initiation and the moment an A/B test variant becomes visible to users.
STTV is a key performance indicator for evaluating A/B testing platform efficiency, particularly for client-side implementations. It encompasses the time required for test script loading, variant assignment logic execution, and DOM manipulation to display the correct experience. Industry best practice targets STTV under 100-200 milliseconds to minimize user experience impact.
Monitoring STTV helps optimization teams identify when their testing infrastructure itself becomes a performance bottleneck. Excessive STTV creates cumulative layout shift, harms Core Web Vitals scores, and can suppress conversion rates across all test variants. Teams often use STTV as a deciding factor when choosing between client-side, server-side, or edge-based testing architectures.
An analytics dashboard shows that mobile users experience an average STTV of 1.2 seconds while desktop users see only 400ms, prompting the team to implement asynchronous loading specifically for mobile variants to reduce the performance gap.
Sample size: Sample size refers to the number of individual data points or subjects included in a study or experiment. In the context of A/B testing or marketing, the sample size is the total number of people or interactions (like email opens, webpage visits, or ad viewers) you measure to gather data for your test or analysis. A larger sample size can lead to more accurate results because it offers a more representative snapshot of your overall audience or market.
Script Execution Time is the duration the browser's JavaScript engine spends parsing, compiling, and running JavaScript code on a webpage.
This metric specifically measures CPU processing time rather than download time, representing how long JavaScript blocks the main thread from responding to user interactions. Script execution time is particularly impactful on mobile devices with slower processors and can cause pages to feel unresponsive even after visual content has loaded. The metric includes both initial execution and any subsequent JavaScript-triggered updates to the page.
In A/B testing implementations, excessive script execution time from testing platforms or variant-specific JavaScript can delay interactivity and harm conversion rates independent of the actual changes being tested. This is especially problematic for client-side testing tools that must execute complex targeting logic and DOM manipulation. Monitoring script execution time helps teams identify when testing infrastructure or specific variants create performance bottlenecks that compromise the validity of test results.
An A/B test on a product listing page shows both variants performing poorly compared to historical data; performance analysis reveals the new testing platform's targeting script requires 850ms of execution time, delaying Time to Interactive and causing a 12% increase in bounce rate across all test groups.
Secondary Action: A Secondary Action is an alternative operation that a user can take on a webpage apart from the primary goal or action. This can be actions like "Save for later," "Add to wishlist," or "Share with a friend." While the primary action is usually tied to conversions such as making a purchase or signing up, secondary actions are still important as they can lead to future conversions or drive other valuable behaviors on your site. It's a way of keeping users engaged even if they're not ready for the primary action yet.
Segmentation is the process of dividing your audience or customer base into distinct groups based on shared characteristics, such as age, location, buying habits, interests, and more. By segmenting your audience, you can create more targeted and personalized marketing campaigns that better address the needs and wants of specific groups, leading to higher engagement and conversion rates.
Server latency is the time delay between when a server receives a request and when it begins sending a response, representing the duration required for the server to process the request. It measures server-side processing efficiency independent of network transmission time.
Server latency is a component of overall Time to First Byte and depends on factors like server computational resources, database query efficiency, application code optimization, and current server load. High server latency can result from complex database queries, inefficient code, insufficient server resources, or processing bottlenecks. It's distinct from network latency, which measures transmission time across network infrastructure.
In A/B testing, server latency differences between variations can confound test results by introducing performance disparities unrelated to the design or content changes being tested. If a server-side implementation causes one variation to have higher latency, any observed conversion differences may reflect page speed impact rather than the actual changes being tested. Monitoring server latency ensures test integrity and accurate attribution of results.
A test comparing two product recommendation algorithms shows the new algorithm performing 12% worse. Investigation reveals the new algorithm's complex calculations add 600ms of server latency per page load, causing the performance drop. The test actually measured server performance, not the quality of recommendations, requiring optimization before valid testing can occur.
Server-Side Testing is a type of A/B testing where the test variations are rendered on the server before the webpage or app is delivered to the user's browser or device. This type of testing allows for deeper, more complex testing because it involves the back-end systems, and it's particularly useful for testing performance optimization changes such as load times or response times.
Shopify is a fully-hosted, subscription-based e-commerce platform that enables businesses to create and manage online stores without handling technical infrastructure.
As one of the leading SaaS e-commerce solutions, Shopify serves over 4 million merchants worldwide with an all-in-one platform including hosting, security, payments, and store management. The platform offers various pricing tiers and uses a templated approach with customizable themes and apps from its extensive marketplace. Shopify handles all technical maintenance, updates, and security, allowing merchants to focus on selling.
Shopify's closed ecosystem and theme-based architecture create specific constraints and opportunities for A/B testing and CRO implementation. Testing strategies must work within Shopify's Liquid templating system and app ecosystem, with some testing methods requiring specific Shopify-compatible tools. The platform's standardized checkout process limits experimentation opportunities on Shopify's lower tiers, making understanding these limitations crucial for optimization planning.
A CRO specialist implements split testing on a Shopify store's product pages using a Shopify app integration, but must upgrade to Shopify Plus to run experiments on the checkout page due to platform restrictions on standard plans.
Significance Level: The significance level, often denoted by the Greek letter alpha (α), is a threshold that a statistical test must exceed to be considered statistically significant. It's a probability value that determines whether you should reject or fail to reject the null hypothesis in a hypothesis testing. In simpler terms, it's the probability of rejecting the null hypothesis when it is actually true, thus leading to a type I error. Commonly used significance levels are 0.05 (5%) and 0.01 (1%).
Simpson’s Paradox is a statistical phenomenon in which a trend appears in several different groups of data but disappears or reverses when those groups are combined. It occurs when the relationship between variables is influenced by a hidden or confounding variable that changes the overall outcome when data is aggregated.
Simpson’s Paradox often arises in real-world datasets where variables are not independent—for example, when different groups have uneven sample sizes, or when a third variable affects both the grouping and the outcome. When data is viewed at the aggregate level, the confounding variable can distort the overall trend, producing a conclusion that contradicts the subgroup trends.
This phenomenon is especially relevant in product analytics, experimentation, medical research, social science, and any field where decisions rely on segmented vs. aggregated data.
Simpson’s Paradox can lead teams to draw incorrect conclusions from experiments or analyses if they rely solely on aggregated results.
In A/B testing, it may cause a variant to appear to win overall while losing in every major segment—or vice versa—if underlying user distributions shift or if segments behave differently.
Recognizing and checking for Simpson’s Paradox is essential for:
This awareness helps teams distinguish true causal effects from artifacts of data aggregation.
A product team runs an A/B test on a new onboarding flow.
Upon investigation, the team discovers that one segment with significantly lower completion rates is overrepresented in Variant B due to random imbalance. When weighted properly, the paradox disappears—confirming that Variant B is actually better.
This example illustrates how Simpson’s Paradox can obscure true performance unless segmented analysis is performed alongside aggregate metrics.
Split URL Testing, also known as A/B testing, is a method used to compare two versions of a webpage to see which one performs better. In this test, the traffic to your website is divided between the original webpage (version A) and a different version of the webpage (version B) to see which one leads to more conversions or achieves your designated goal more effectively. The webpage that achieves the higher conversion rate is typically the winner. This type of testing is useful for making decisions about changes to your website and improving its effectiveness.
Mida is 10X faster than anything you have ever considered. Try it yourself.