# Sample Size and Sampling

## What is Sample Size?

Your sample size is the amount of people taking part in a specific scientific experiment. It can be the difference between a result you can do something with and a result you can’t.

If you flip a coin 3 times you could have 3 heads in a row, right? So, you could assume that if I flip this coin again, I will get a head? We all know this is not true but use this same bias when testing.

The truth naturally is there is still equal chance of getting a head or tail the next time you flip. It is just coincidence that it came up 3 heads for the first three flips. If you flipped it another 100 times you are much more likely to get an even number of heads and tails. This is the importance of sample size.

The problem is that many experimenters start testing that is already doomed to fail because of an inadequate sample size. This isn’t just restricted to conversion rate optimisation either, in fact Daniel Kahneman Author of Thinking, Fast and Slow identified this impacts many academics and scholars.

A sample is a representative group of individuals from a specific population. In the land of CRO this is essential a selection of users who use your website. This can sometimes be segmented into specific channels such as PPC and Social or specific demographics like new or returning users.

## Why is it important?

Sapling is important is to ensure the reliability and validity of an experiment.

### Reliability

The reliability is how consistent and stable the results are, “if we run this again would we get the same results?” If a test is unreliable, we cannot have confidence that implementing the winner will have the desired outcome.

With an insufficient sample size, we cannot claim a test to be reliable. Going back to the coin example, Darren Brown highlight problem in his show “the system” with a small sample size. This video shows how it is entirely possible to get 10 heads in a row, but it does not mean that coin will always give you heads.

What he went on to explain was that this took him the better part of 8 hours to do in. If the sample taken to understand the odds of flipping a coin was just these 10 flips then we would assume that the odds of getting a head would be 100%, and we would be wrong.

A small sample size is open to influence by outlying values and data which skew the overall result. The more data you put in the less these outliers impact the result.

### Validity

Validity is whether the test measures what we want it to. If we are using the wrong tools to measure, such as the wrong sample our test results cannot claim to be what they say.

Without a representative sample your result become invalid, and without the right sample size they become unreliable. If you want to test PPC landing pages use users coming from PPC.

## How do you Identify your Sample?

While creating your hypothesis you should have highlighted the audience you want to target. This needs to be in the front of mind throughout this exercise.

Now you know who you want to target you want to know how big this population is? How many of the audience you want to target are looking at the page you want to test.

You can do this in Google Analytics by placing a segment of your audience (e.g. PPC users) and the page you want to test (e.g. basket page). We recommend looking at this over a four-week period to try and counteract the difference in performance from the beginning of the month to end of the month and different days of the week. This is not perfect but comes close.

You then want to get the metric you are measuring for this audience. If could be bounce rate, ecommerce conversion rate or form enquiry.

Now you should have the following:

- Number of users
- Baseline metric

Using these to numbers we can identify the minimal detectable difference needed for a difference to be significant. You will need to establish the degree of significant you want for example 90% or 95% but we would always recommend 95% for the most confidence.

Here are some tools you can use to identify the difference you need for this to be significant:

- http://www.raosoft.com/samplesize.html
- https://www.surveysystem.com/sscalc.htm
- https://www.checkmarket.com/sample-size-calculator/
- https://www.calculator.net/sample-size-calculator.html
- https://www.abs.gov.au/websitedbs/D3310114.nsf/home/Sample+Size+Calculator

## What to do if you have too small a sample?

A small sample means you need a higher detectable difference for the results to be statistically valid. If you sample is too small, and you need a big improvement you may want to re-think you testing strategy.

The good news is there are ways to test event with this small a sample.

### Increase the length of time

If over 4 weeks you don’t have enough data to run your test, try running if over 8 weeks. While not ideal and never our first recommendation if you want to ensure validity and reliability in your testing you will need more data.

The length of time you are willing to run test for will be dependant on your business. Our recommendation is to not go after testing that takes more than 6 weeks. Always try to focus on the areas of your site where there is enough traffic to run more tests as this will impact your business greater.

### Change the point of conversion

Another way to improve the chance of statistical significance is to change the point of conversion. Wanting to track enquiries or sales is the aim but this may not be applicable if you only get a few of these a week.

Instead look for softer metrics that occur more often but are an indicator of performance, bounce rate, pages/session or add to baskets are but a few.

With these occurring more often the rate of increase needed to impact significance is less making the chance of doing this more likely.

### Be more radical

Finally, if you need a 50% increase in conversion rate to be significant than changing a headline or removing a field from a form is unlikely to do this.

Being more radical will ensure a larger change in performance, however it comes with additional risks.

If your site is young with limited data, this is the way to test. Once you get some big wins you can iterate those versions to fine tune and understand which elements are really having an impact.