The post How To Perform Descriptive Statistics In Microsoft Excel appeared first on Top Tip Bio.

]]>A useful feature in Excel that many people do not know about is the Data Analysis ToolPak.

The Data Analysis ToolPak is an add-on created by Microsoft to make it easier to perform different data analysis procedures.

To ensure you have the Data Analysis ToolPak activated correctly, go to *File*>*Options*.

Then select *Add-ins*.

At the bottom, where it says *Manage*, ensure you select *Excel Add-ins* and click *Go*.

Ensure you have the *Analysis ToolPak* option ticked, and click the *OK* button.

Now, when you select the *Data* tab at the top, you should see a *Data Analysis* button appears.

Now we’re ready to perform the descriptive statistics.

To perform descriptive statistics in Excel, go to *Data*>*Data Analysis*.

Then, from the list, select *Descriptive Statistics*.

This should open a new window.

For the input range, this is where you enter the range of cells containing your data.

Next, need to tell Excel how your data are entered in your sheet.

If you have the values stacked in a column, then these will be grouped by *columns*. If they were entered into rows instead, then select the *rows* option.

If you also have labels at the top of your data in the first row, then ensure you select *Labels in First Row*.

For the output options, this is where you specify where you want the descriptive statistics to be returned

There are three options:

**Output Range**– Enter a specific cell in this same worksheet where you want the results to be entered**New Worksheet Ply**– Enter the results in a new worksheet**New Workbook**– Enter the descriptive statistics in an entirely new Excel file

Underneath, you can tick the various descriptive statistic options that you want to perform. At a minimum, you will want to select the *Summary Statistics*. The *Confidence Level for Mean*, *Kth Largest* and *Kth Smallest* options are optional, and I will explain more about these later.

So that’s the setup done, now press the *OK* button to perform the descriptive statistics.

Below are some descriptive statistics I have based on some example data. I’ll now go over and explain each output.

The first result is the mean, or average, value.

This is where you add all the values up in your sample and divide by the number of values in the sample.

To calculate the mean value youself, you can use the AVERAGE function.

=AVERAGE(data)

The standard error is a measure of the variability of sample means in a sampling distribution of means.

The higher the standard error, the higher the variability.

You can calculate the standard error yourself by taking the standard deviation and dividing it by the square root of the count.

=STDEV.S(data)/SQRT(COUNT(data))

Say we sorted our data in ascending order from smallest to largest. The median will be the number that lies in the middle of the values, in my case, the answer is 50.3.

You can calculate the median separately by using the MEDIAN function.

=MEDIAN(data)

The mode represents the value that appears the most in the sample

The mode for my example is 49.8 since this value appears twice.

If you want to calculate this yourself, you can use the MODE.SNGL function.

=MODE.SNGL(data)

The standard deviation is a measure of the amount of variation in the data relative to the mean, where the higher the standard deviation, the higher the variability.

If you would like to calculate the standard deviation of a sample separately from the descriptive statistics, then use the STDEV.S function.

=STDEV.S(data)

The variance is simply the square of the standard deviation.

The Excel function for the variance of a sample is VAR.S.

=VAR.S(data)

Kurtosis is a measure that defines how heavily the tails of a distribution differ from the tails of a normal distribution.

In Excel, a normal distribution has a kurtosis value of 0, positive values indicate a relatively peaked distribution and negative values indicate a relatively flat distribution.

You can use the KURT function to calculate the kurtosis separately.

=KURT(data)

Skewness* *is a measure of asymmetry of the data distribution.

A value of 0 indicates a perfectly symmetrical distribution.

If the value is between -1 and 1, but isn’t 0, then the data are considered fairly skewed. If the value is outside this range, then the data are highly skewed.

The function for skewness is SKEW.

=SKEW(data)

The range is simply the difference between the smallest and largest values in the data.

The minimum is the smallest number in the data.

You can use the MIN function to work this out yourself.

=MIN(data)

As the name suggests, the maximum is the largest number in the data.

To calculate this separately, use the MAX function.

=MAX(data)

The sum is the total calculated when you add up all the values in the data.

Use the SUM function to do this yourself.

=SUM(data)

The count is the total number of values you have in your data.

Use the COUNT function to calculate the count separately.

=COUNT(data)

During the descriptive statistics setup, if you selected the *Kth Largest* option, then you’ll also have this result too.

Say you have 1 in the box next to the *Kth Largest* option, this would mean the first largest value in the data will be returned – in other words, the maximum.

If a 2 was used instead, then the second largest value will be returned, and so on and so forth.

Similar to the above, if you selected the *Kth Smallest* option during the setup, then you’ll also have this result returned.

Having 1 entered during the setup will mean the first smallest value will be determined – the minimum value.

Again, changing this input to a 2 would mean the second smallest value is returned instead.

The last entry in the descriptive statistics table is the confidence interval value.

This is the value you can add and subtract from the mean to calculate the upper and lower confidence intervals, respectively. Specifically, these will be the 95% confidence intervals, since 95 was entered during the setup.

If you want to learn more about calculating confidence intervals in Microsoft Excel, then check out this post.

You should now know how to calculate some descriptive statistics in Excel.

Using the Data Analysis ToolPak makes this process so much easier and quicker than using separate functions.

Microsoft Excel version used: 365 ProPlus

The post How To Perform Descriptive Statistics In Microsoft Excel appeared first on Top Tip Bio.

]]>The post How To Create A QQ Plot In Microsoft Excel appeared first on Top Tip Bio.

]]>QQ plots are great visual aids to inspect the distribution of your data. Most commonly, QQ plots are used to see if the data follows a normal distribution.

Here’s a sneak peak at the end product.

In this example, I have a sample containing 49 different data points. These data points have been entered into the first column of my Excel sheet.

What I want to do for this example is to create a QQ plot in Excel to determine if my sample data has a normal distribution.

The first step to create a QQ plot in Excel is to rank the data in ascending order (from smallest to largest). This is really easy to do with the RANK AVERAGE function.

=RANK.AVG(number, ref, [order])

**number**– The cell containing the data point you want to rank**ref**– The range of cells containing the complete data**[order]**– Enter 1 to rank the cell in ascending order

Here’s what the formula looks for my example.

=RANK.AVG(A2,$A$2:$A$50,1)

Notice that I have also included a $ symbol before the column letters and row numbers in the ref part of the formula. This is because I want these particular cells to remain constant when I copy the formula down.

Once running this formula, you need to copy the formula down to repeat the process for all the data points.

You should be left with a ranking order of your data.

For the next step, you need to calculate the percentile value of the ranks.

To do this, you simply take the rank of the data point and subtract 0.5 from it. You then divide this answer by the number of data points in your sample.

Here’s an overview for what the formula will look like in Excel.

=(rank-0.5)/COUNT(data)

**rank**– The cell containing the rank**data**– The range of cells containing the complete data

And here’s what this looks like for the first rank in my data.

Within the COUNT function, notice that the range of cells are also locked (contain the $ symbols).

As with the first step, you want to repeat the function so that all the percentile values for all your ranks are calculated.

The next step to calculating the QQ plot in Excel is to work out the normal theoretical quantiles.

Specifically, these quantiles are Z-scores based on a normal distribution, where the mean is 0 and the standard deviation is 1.

To do this, I will use the NORM.S.INV function.

=NORM.S.INV(probability)

**probability**– The cell containing the percentile

Simply add in the cell containing the percentile values calculated in the previous step.

Now we have the normal theoretical quantiles, the final calculations we need are the Z-scores for the quantiles based on the original data.

To do this, I will use the STANDARDIZE function to create Z-scores.

=STANDARDIZE(x, mean, standard_dev)

**x**– The cell containing the data point**mean**– The average value of the data**standard_dev**– The standard deviation of the data

Note, calculating Z-scores in Excel is discussed in more detail in this post.

For the mean and standard_dev parts of the formula above, you can use the AVERAGE and STDEV (or STDEV.S) functions, respectively.

Here’s what the formula looks like for my first data point in my example.

=STANDARDIZE(A2,AVERAGE($A$2:$A$50),STDEV($A$2:$A$50))

Again, the $ symbols are included to lock the range of cells inside the AVERAGE and STDEV functions. The formula is then copied down to calculate the Z-scores for all my data.

Now we have everything we need to create the QQ plot in Excel.

The QQ plot is simply a scatter plot with the normal theoretical quantiles (X axis) against the data quantiles (Y axis).

To create the plot, go to *Insert>Insert Scatter>Scatter*.

One thing you will probably want to do is adjust the axes, so that they are not placed in the middle of the graph.

To do this, right-click on the graph and select *Format Chart Area*.

Use the dropdown menu to select either *Horizontal (Value) Axis* or *Vertical (Value) Axis*.

In the *Axis Options*, I recommend adjusting where the axis crosses by defining your own *Axis value*.

A common feature of a QQ plot is to add a linear trendline to the graph to make it easier when interpreting the results.

To do this, with the graph selected, go to *Chart Design>Add Chart Element>Trendline>Linear*.

To interpret the QQ plot, you want to look at the data points on the graph and how they fit on the linear line.

If the data has a completely normal Gaussian distribution, then all data points will fit perfectly on the linear line. The data will also follow the linear line in a 45 degree angle.

Looking at my example, I can see that the majority of my data points are either on or are close to the linear line.

So, I’m fairly confident that I have an approximately normal or Gaussian distribution.

It’s also worth plotting a frequency histogram to explore normality further.

In this tutorial, I have shown you how to create a QQ plot in Microsoft Excel. I’ve also shown to you to interpret the results of the plot.

To create a QQ plot to assess data normality, you must manually calculate the normal theoretical quantiles and plot these in a scatter plot against the actual data quantiles.

Microsoft Excel version used: 365 ProPlus

The post How To Create A QQ Plot In Microsoft Excel appeared first on Top Tip Bio.

]]>The post How To Do Basic Math In Excel (Add, Subtract, Multiply & Divide) appeared first on Top Tip Bio.

]]>So, let’s get to it.

The most basic way of adding cells is to simply do this manually by using the + operation.

=Number1+Number2

You simply replace Number1 and Number2 with the numbers your interested in, or cells containing your data.

In my example, I have two cells containing the numbers 5 and 15, which I want to add together.

To do this in a new cell, I can enter the following formula.

=B1+B2

A great thing about Excel is that it includes a function that can add a range of cells quickly for you.

To do this, use the SUM function.

=SUM(Number1, [number2],...)

Within the parentheses, simply insert the range of cells you’re interested in.

If I use the same example as shown above, I will use the following function.

=SUM(B1:B2)

To subtract numbers or cells in Excel, then use the – operation.

=Number1-Number2

Replace Number1 and Number2 with the numbers your interested in, or cells containing your data.

In my example below, I have two cells containing the numbers 12 and 5. I want to subtract 5 from 12.

To do this, in a new cell, I will enter the following.

=B1-B2

If you want to multiply certain numbers or specific cells together, then the easiest way to do this is by using the * operation.

=Number1*Number2

In this example, I have two cells (B1 and B2) that contain the numbers 5 and 20, repectively.

If I want to multiply these cells, I will use the following formula.

=B1*B2

If you have a range of numbers or cells that you want to multiply together, then you can use the PRODUCT function.

=PRODUCT(Number1, [number2],...)

Within the parentheses, simply insert the range of cells you’re interested in.

For my example, the formula will look like this.

=PRODUCT(B1:B2)

Finally, if you want to divide cells in Excel, then use the / operation.

=Number1/number2

Replace number1 and number2 with the cells you’re interested in.

For my example below, I have cells containing the values 75 (B1) and 25 (B2). If I wanted to divide 75 by 25, I will enter the following formula.

=B1/B2

You should now have a better understanding of performing basic math (add, subtract, multiply and divide) in Microsoft Excel.

The table below summarises the operations you will need to perform each task.

Calculation | Operation |
---|---|

Add | + |

Subtract | – |

Multiply | * |

Divide | / |

If you want to add a lot of cells, then use the SUM function.

Additionally, if you want to multiply a range of cells, then use the PRODUCT function.

Microsoft Excel version used: 365 ProPlus

The post How To Do Basic Math In Excel (Add, Subtract, Multiply & Divide) appeared first on Top Tip Bio.

]]>The post How To Calculate The Standard Deviation (Clearly Explained!) appeared first on Top Tip Bio.

]]>The standard deviation is basically used to show how spread out the data are, relative to the mean.

A low standard deviation indicates that the data are close to the mean, whereas a high standard deviation indicates that the data are spread further away from the mean.

For this example, let’s say I’ve measured the weight of 10 female bulldogs of a similar age. Below are their weights, measured in kilograms.

Dog | Weight (kg) |
---|---|

1 | 20.4 |

2 | 25.2 |

3 | 20.5 |

4 | 23.2 |

5 | 22.6 |

6 | 25.9 |

7 | 23.4 |

8 | 22.5 |

9 | 22.1 |

10 | 21.0 |

This randomly selected group of female bulldogs is known as a sample.

The first thing you need to do to calculate the standard deviation of the sample is to work out the average value.

To do this you simply add up all the values in your sample, and divide the answer by the number of values in the sample.

In my example, adding up all the values comes to 226.8 kg.

Then, since I have 10 individual dogs in my sample, I will divide this by 10 to give me an average weight of 22.7 kg.

Now we have the average weight, the next step is to work out how far away these individual values are from the average value.

So, let’s take the first dog, which weighed 20.4 kg.

To work out the difference, simply do 20.4 – 22.7.

The answer is -2.3 kg.

For the next dog, the calculation will be 25.2 – 22.7, which comes to 2.5 kg.

And the process is then repeated for all values in the sample.

Once we have these values, the next step is to square each one.

So, for the first dog, this will be -2.3 kg squared.

This answer is 5.29 kg^{2}.

The next dog will be 2.5 kg squared, which comes to 6.25 kg^{2}.

And again, the process is repeated for all values.

The next step is rather simple, we need to add up all these squared values we just calculated.

So, doing so for my example will give a value of 30.66 kg^{2}.

The next step for calculating the standard deviation of a sample is to divide this value by what is known as n – 1.

In this case, n is the number of values in the sample, in other words, the number of dogs for my example.

Again, this is 10.

So, 10 – 1 = 9.

This means I need to do 30.66/9.

Doing so gives a value of 3.4 kg^{2}.

This value right here is known as the variance.

To go from the variance to the standard deviation, you simply take the square root of the variance value.

So, for my example, this would be the square root of 3.4.

Doing so, gives a standard deviation of 1.8.

And since the standard deviation is expressed in the same units as the original measurements, I can say this is 1.8 kg.

So, for my sample, the average weight of the female bulldogs was 22.7 kg, with a standard deviation of 1.8 kg.

Usually, this is written as the mean value ± the standard deviation; in my example this would be 22.7 ± 1.8 kg.

If you’re wondering what the complete equation is to calculate the standard deviation, as described above, then the below image shows the standard deviation formula for a sample and a population.

You’ll notice a slight difference between the two formulae; specifically, this is the denominator.

The standard deviation for a sample has n – 1, whereas the population formula has just N.

The n – 1 part in the sample calculation is known as Bessel’s correction.

It is simply used to correct for bias during the calculation, since the sample is just a randomly selected number of subjects from the overall population. When calculating the population standard deviation, no correction is required.

This is just something to keep in mind; however, most of the time, you will probably be only interested in calculating the standard deviation of the sample.

As you can see, when you break down the standard deviation formula, it’s actually quite straight forward!

You should now understand how to calculate the standard deviation of a sample and a population by hand.

Luckily, most statistical analysis software, including Microsoft Excel, has a function to do this for you.

The post How To Calculate The Standard Deviation (Clearly Explained!) appeared first on Top Tip Bio.

]]>The post How To Calculate A Weighted Average In Microsoft Excel appeared first on Top Tip Bio.

]]>The standard arithmetic average is the sum of the data values, divided by the number of data values in the dataset. In this calculation, each data value is treated equally.

On the other hand, the weighted average is the sum of each data value multiplied by their weight, divided by the sum of the weights. In this calculation, data values with higher weights have more influence over the final average, compared with values with lower weights.

For this tutorial, let’s say I’m a teacher at a school. In my class, each student if given marks out of 100 for 5 different tests:

- Coursework assignment 1
- Coursework assignment 2
- Group presentation
- Exam
- Laboratory practical

The scores displayed in this example are just for one student. So, this particular student scored 76 out of 100, or 76% on the exam.

Each test is given a different weight. Specifically, each coursework assignment are given weights of 15%, the group presentation 10%, the exam 40% and the laboratory practical 20%.

I’ll now show you two methods to calculate the weighted average in Excel. The first method is slightly long, but I will show you as it helps to understand the formulas involved. The second method uses the SUMPRODUCT function to quickly calculate the weighted average.

To calculate the weighted average, you firstly have to multiple each score by its weight.

For example, for the coursework assignment 1, I will multiply 56 by 15. This is then repeated for all the tests.

The image below shows the formulas used for my example.

Next, simply add up all the values calculated from step 1.

You can easily do this by using Excel’s SUM function.

=SUM(number1, [number2], ...)

Remember to add in the desired cell range within the brackets.

So, for my example, the formula will be =SUM(D2:D6). Doing so gives a value of 6805.

Again, by using the SUM function, add up all the weights.

For my example, I will use the formula =SUM(C2:C6).

I get a value of 100.

Finally, to calculate the weighted average, you need to divide the value from step 2 by the sum of all weights (answer from step 3).

Doing this for my example means I divide 6805 by 100, which comes to 68.05.

Since my score is out of 100, I can say that this student had a weighted average of 68.05%.

This first method I have just demonstrated is a slightly long-winded approach. I purposely did this to demonstrate how to calculate the weighted average manually.

I’ll now show you how to do this quickly, by taking advantage of Excel’s SUMPRODUCT and SUM functions.

The SUMPRODUCT essentially performs steps 1 and 2 from the first method described above.

In a new cell, simply use the following formula.

=SUMPRODUCT(array1, [array2], [array3], ...)/SUM(number1, [number2], ...)

- SUMPRODUCT – Within the brackets, replace array1 with the range of cells containing the score, and replace array2 with the range of cells containing the weights
- SUM – Within the brackets, insert the range of cells containing the range of cells containing the weights

For my example, the formula will be =SUMPRODUCT(B2:B6,C2:C6)/SUM(C2:C6).

Now you know how to calculate the weighted average by using Microsoft Excel.

The first method describes the step-by-step approach to calculating the weighted average.

The second method uses the SUMPRODUCT and SUM functions to calculate the weighted average in a single step.

Microsoft Excel version used: 365 ProPlus

The post How To Calculate A Weighted Average In Microsoft Excel appeared first on Top Tip Bio.

]]>The post How To Perform Random Sampling In Microsoft Excel appeared first on Top Tip Bio.

]]>Random sampling is a selection technique used where you take a population and randomly select a wanted number subjects to make a smaller group known as a sample.

During random sampling, each subject has an equal chance of being selected in the sample.

For the first method, I wil perform random sampling my using Excel’s Analysis ToolPak add-on. The Analysis ToolPak is an additonal set of options that can help you perform certain statistical functions in Excel easily.

To ensure you have the Analysis ToolPak activated correctly, go to *File*>*Options*.

Then, select *Add-ins*.

At the bottom, where it says manage, ensure you select *Excel add-ins* and click *Go*.

Ensure you have the *Analysis ToolPak* option ticked, and click the *OK* button.

Now, when you select the *Data* tab at the top, you should see a *Data Analysis *button appears.

Now we’re ready to perform the sampling. To do this, select the *Data Analysis* button.

Then, select *Sampling* from the list and click *OK*.

For the *Input* options, do the following.

**Input Range**– Enter the cells containing the data you are interested in from your population. Note, the cells entered here can only contain numeric values**Labels**– If you have highlighted a header in your Input Range selection, make sure you tick the labels option

Moving onto the *Sampling Method*, you want to select *Random*.

The *Number of Samples* is simply the number of randomly selected values from your population. For example, entering 10 will return 10 random values.

Lastly, there are three options to pick from to decide where you want the random sample to be entered.

**Output Range**– This will put the sample in the current sheet. You can use the box to specify a specific cell**New Worksheet Ply**– This will put the sample in a separate worksheet. Use the box to give the new worksheet a name**New Workbook**– The last option will put the sample in a new Excel file

Here’s what my Sampling window looks like for my example.

Finally, click *OK* to run the sampling.

You should now see the randomly select values from the original population.

The downside to using Method 1 is that you can only perform random sampling on cells containing numeric values. For Method 2, I will show you how to use the RAND function, which will enable you to randomly select different rows Therefore, you can have a mixture of cells containing numbers as well as characters in the random sample.

The RAND function creates a random number between 0 and 1, and you’ll see why this is useful for random sampling now.

In an empty column, enter the RAND function in each cell that is adjacent to your population data.

=RAND()

Notice that nothing is entered in the brackets.

A random number will now be seen. Note, everytime you perform something in Excel, this number will be recalculated. This isn’t a big issue, and I’ll show you how to manage this shortly.

Next, repeat this process for all cells that are adjacent to your population data.

To ensure the random values do not keep on updating everytime you do something in the sheet, simply highlight all the random numbers and then copy them (*right-click*>*Copy*).

Then, in the same cells, paste the values (*right-click*>*Paste Special*>*Paste Values*).

Now, the random function has been removed from the cells, leaving just the random numbers. This means the values will no longer update when ou make changes to our sheet.

Next, highlight all the columns containing the random numbers and your population data.

Then, go to *Home*>*Sort&Filter*>*Custom Sort*.

In the new window, choose to sort by the column containing the random numbers, and order these smallest to largest, then click *OK*.

Now, the data has been sorted according to the random numbers.

Decide on the number of values you want in your random sample. Then, take that number of values from the top of the sheet.

For example, I want to create a sample containing 10 values. I can simply highlight the first 10 rows in my population data copy and paste them to a new part in this sheet, a new sheet or a new Excel file.

You should now know two different ways of performing random sampling in Excel.

The first method uses the Analysis ToolPak to pick out a desired number of randomly selected cells containing numbers.

The second method takes advantage of the RAND function to randomise the subjects in the population, which can then be copied over to another sheet, for example.

Microsoft Excel version used: 365 ProPlus

The post How To Perform Random Sampling In Microsoft Excel appeared first on Top Tip Bio.

]]>The post How To Perform A Chi-Square Test Of Independence In Excel appeared first on Top Tip Bio.

]]>Let’s say I have a sample of 200 people that visited my local pub. From these 200 participants, half were male and half were female.

I asked each participant if they were a smoker or non-smoker. Here are my results:

Smokers | Non-smokers | |
---|---|---|

Male | 29 | 71 |

Female | 16 | 84 |

So, there were 29 males that smoked, and 71 that didn’t. For the females, there were 16 smokers and 84 non-smokers. Since these are the actual values from my experiment, they are known as the *observed* values.

What I want to do is to perform a chi-square test of independence to see if there is an association between gender and smoking status in my sample.

In the first part of this tutorial, I will show you how to manually perform the chi-square test in Excel, including calculating the chi-square statistic and p-value.

In the latter part of the tutorial, I will describe how to use an Excel function (CHISQ.TEST) to calculate the p-value quickly from the observed and expected values.

The first step to performing a chi-square test is to add up each of the rows and columns in the contingency table by using the SUM function.

=SUM(number1, [number2], ...)

So, to calculate the total in the smokers column, I will use the following formula in a new cell.

=SUM(B3:B4)

So, in total, there were 45 smokers.

I now need to repeat this process for the next column, as well as the rows in my table. Additionally, you need to calculate the overall total from your table.

The image below shows all of the formulas used for my example.

Moving on, you next need to work out the expected value for each entry in the table.

To work out the expected value, you must multiply each row total by each column total, and divide that answer by the overall total.

To work out the expected number of male smokers in my example, I will use the following formula.

=(D3*B5)/D5

So, in my example, the expected number of male smokers was 22.5.

Again, this process needs to be repeated for all entries in the contingency table.

The next step is to subtract each of the expected from the observed values, square it, then divide that answer by the expected value.

So, for my example, I will use the following formula for the male smokers.

=(B3-G3)^2/G3

This process needs to be repeated for the rest of the entries in the table.

Next, we need to calculate the chi-square statistic.

To do this, simply add up all the values that were recently calculated in step 3.

For my example, I will use the following formula.

=SUM(K3:L4)

So, the chi-square statistic for my example was 4.85, when rounded.

Next, we need to calculate the degrees of freedom.

Here, the degrees of freedom is calculated by firstly subtracting 1 from the number of rows in the test, and multiply this answer by the number of columns in the test subtract 1.

So, for my example, I have 2 rows and 2 columns. This means to work out my degrees of freedom, I use the following calculation (you can just perform this manually, as it’s very simple math).

(2-1) x (2-1)

Which gives an answer of 1. So, this example has a degrees of freedom of 1.

The final step in performing the chi-square test is to take the chi-square statistic and degrees of freedom values, and work out the p-value.

To do this in Excel, you can use the CHISQ.DIST.RT function.

=CHISQ.DIST.RT(x, deg_freedom)

**x**– The cell containing the chi-square value**deg_freedom**– The cell containing the degrees of freedom value

In my example, I get a p-value of 0.028, when rounded.

There is a function you can use to calculate the chi-square p-value by just using the observed and expected table values.

To do this, use the CHISQ.TEST function.

=CHISQ.TEST(actual_range, expected_range)

**Actual range**– The cells containing the observed values**Expected range**– The cells containing the expected values

To interpret the p-value, you need to state the my two hypotheses (null and alternative).

Here are my hypotheses for my example.

**Null hypothesis**– There is no association between gender and smoking status**Alternative hypothesis**– There is an association between gender and smoking status

If my alpha level, or significance threshold, was set at 0.05, this would mean I will fail to reject the null hypothesis if p>0.05. On the other hand, if p<0.05, I will reject the null hypothesis, and accept the alternative hypothesis.

In this case, my p-value was 0.028. Since this was less than 0.05, I will reject the null hypothesis, and accept the alternative hypothesis.

Therefore, there does seem to be an association between gender and smoking status.

After following this guide, you should now know how to perform a chi-square test of independence by using Microsoft Excel.

The steps described will calculate the chi-square statistic, degrees of freedom and the p-value. Alternatively, for a quick and easy way of generating a p-value, simply use the CHISQ.TEST function.

Microsoft Excel version used: 365 ProPlus

The post How To Perform A Chi-Square Test Of Independence In Excel appeared first on Top Tip Bio.

]]>The post How To Calculate Variance In Microsoft Excel appeared first on Top Tip Bio.

]]>To begin with, let me quickly explain what the variance is.

Variance is a measure of variability.

Simply, it is calculated by taking the average of squared deviations from the mean. The resulting value, known as variance, tells you the degree of spread in your data.

The larger the variance value, the more spread the data is in relation to the mean.

Calculating the variance in Excel will differ slightly depending on whether you want to calculate the variance for a sample or a population.

I’ll start by calculating the variance for a sample, which is probably what most people are interested in.

In my Excel sheet, I have some example data.

Let’s say I am interested in a rare fish called Fish X. What I’ve done is take a sample of 15 Fish X fishes and measured their length in centimetres.

To calculate the variance of a sample in Excel, use the VAR.S function.

=VAR.S(number1, [number2], ...)

Within the parenthesis, add in the cells containing your data.

Here is what my formula looks like for the Fish X example.

The variance for my sample is 389.924.

The units for the variance are the same as the original measurement squared; so my variance is in fact 389.924 cm^{2}.

If you’re wondering, you can calculate the standard deviation from the variance by simply square rooting the variance. You do this by using the SQRT function.

=SQRT(number)

- number – Replace this with the cell containing the variance value

The standard deviation comes out to be 19.7465.

The units for this will be the same as the original units for my data, so this would be centimetres.

I’ve got a different dataset now. This time, I have the data for all the Fish X fishes in the world; in total there are 100. So, what I have here is known as a population.

Calculating the variance of a population involves a slightly different function compared with the sample variance. This time, we will use the VAR.P function.

=VAR.P(number1, [number2], ...)

Again, within the parenthesis, add in the range of cells containing your data.

I get a result of 341.12.

Again, this will have the units cm^{2}.

Now you know how to calcuate the sample and population variance in Microsoft Excel.

To calculate the sample variance, use the VAR.S function. If you’re wanting to calculate the variance of a population, then use the VAR.P function instead.

Microsoft Excel version used: 365 ProPlus

The post How To Calculate Variance In Microsoft Excel appeared first on Top Tip Bio.

]]>The post How To Create A Correlation Matrix In Microsoft Excel appeared first on Top Tip Bio.

]]>Below is the correlation matrix I will show you how to create.

For this example, I have 10 different variables; each variable has been entered into a different column.

For each variable I have 19 different values – these are just random numbers for the purpose of this example.

The screenshot below shows the top section of my data sheet.

What I want to do is to create a correlation matrix, which contains the Pearson correlation coefficient values between each of my 10 different variables.

Perhaps the easiest way to create a correlation matrix in Excel is to use the Data Analysis ToolPak. This is an add-on created by Microsoft to provide data analysis tools for statistical analyses.

To install this add-on, go to *File*>*Options*.

Then click on *Add-ins*.

At the bottom, you want to manage the *Excel Add-ins*, and click the *Go* button.

Then, ensure you tick the *Analysis ToolPak* add-in, and click *OK*.

Now, when you click on the *Data* ribbon at the top, you should see a *Data Analysis* button in a sub-section called *Analyze*.

Now we are ready to create the correlation matrix.

To create the correlation matrix, go to *Data*>*Data Analysis*.

From the list, select the *Correlation* option and click *OK*.

Now enter the following:

**Input Range**– Enter the cells containing all of the data to include in the analysis**Grouped by**– If your data are organised into different columns, then select the*Columns*option; if your data are organised into different rows, then select the*Rows*option instead**Labels in First Row**– Tick this if you have cells containing label names at the top of your data set

Under this, there are also three output options to select from for where you want the results to be entered:

**Output Range**– Select an area in the current sheet where you want to results to be placed**New Worksheet Ply**– Save the results in a new worksheet, and give it a name**New Workbook**– Save the correlation matrix in a completely separate Excel file

For my example, I’ve selected the second option to have the correlation matrix returned in a new sheet.

Finally, press the *OK* button to run the analysis.

You should now see a correlation matrix has been created.

The top row and first column will list each of the variables entered into the test.

The numbers in the table represent Pearson correlation coefficient values.

So, in my example, the correlation coefficient value for the relationship between Variable 1 and Variable 4 is 0.108.

The correlation coefficient is a value that ranges from +1 to -1.

A value of 0 means there is no linear correlation between the two variables.

A value of +1 means there is a perfectly positive linear correlation between the two variables; so, as one variable increases, so does the other.

You can see in the matrix that anytime there is a correlation between the same variable, the correlation coefficient value is 1. That’s because if you plot two variables with exactly the same values against each other, then you will always get a perfectly positive linear correlation between the two.

A value of -1 means there is a perfectly negative linear correlation between the two variables. So, as one increases, the other tends to decrease.

Also note that only half the matrix is complete, that is because the results would be the same if these empty cells were calculated.

If you would prefer to calculate the Pearson correlation coefficient values yourself, instead of using a matrix, you can do so with the CORREL function.

=CORREL(array1, array2)

**array1**– All cells containing data for the first variable**array2**– All cells containing data for the second variable

If you wanted to go a step further and calculate a p-value for the Pearson correlation test to see if the result is significant, then I refer to my tutorial on performing a Pearson correlation test in Microsoft Excel.

Sometimes, the cells in a correlation matrix are coloured based on their coefficient values. This can easily be done in Excel by using the conditional formatting.

Firstly, highlight all the values in the table, and then go to *Home*>*Conditional Formatting*>*New Rule*.

Select *format all cells based on their values* as the rule type.

In the rule description underneath, use the dropdown menu to change the style to a *3-color scale*.

Then change the colour settings to:

Minimum | Midpoint | Maximum | |
---|---|---|---|

Type | Number | Number | Number |

Value | -1 | 0 | 1 |

Colour | Red | White | Blue |

When using this conditional formatting rule, any cells with a correlation coefficient value of -1 will be coloured red. The cells with a value of 0 will be coloured white and the cells with a value of 1 will be coloured blue. And because this is a gradient of colours, any values in between these points will have a shade of colour that represents their correlation coefficient value.

You can of course use different colours; however, these seem to be the most popular choices when colouring a correlation matrix.

Below is the correlation matrix for my example with the conditional formatting applied.

In this tutorial, I have shown you how to create a correlation matrix in Microsoft Excel.

A correlation matrix can easily be created by using the Data Analysis ToolPak. Once created, you can take advantage of Excel’s conditional formatting colour the cells according to the correlation coefficient values.

Microsoft Excel version used: 365 ProPlus

The post How To Create A Correlation Matrix In Microsoft Excel appeared first on Top Tip Bio.

]]>The post How To Perform A Two-Sample F-Test In Excel (Variance Test) appeared first on Top Tip Bio.

]]>The reason why you would perform a two-sample F-test for variances is to determine if the variances of two populations are equal.

This is important to know because certain hypothesis tests, such as an independent Student t-test, assumes that the two groups in the test have equal variances.

So, it’s important to know if the two groups have equal variance *before* performing an independent student t-test.

For this tutorial, I will be using the same data as used in my other tutorial on how to perform Student t-tests in Excel.

I have two groups: males and females. In each group there are 8 different participants, which I have measured their height in centimetres (cms).

There are a few ways you can perform the F-test in Excel. You can either use the Data Analysis ToolkPak, or the F.TEST function; I’ll show you both methods.

Let’s start with the Data Analysis ToolPak method.

The first method is to use the Data Analysis ToolPak. This is an add-on created by Microsoft to provide data analysis tools for statistical analyses.

To install the toolpak, go to *File*>*Options*, then select *Add-ins*.

At the bottom, you want to manage the Excel add-ins and click the *Go* button.

Then, ensure you tick the *Analysis ToolPak* add-in, and click *OK*.

Now, when you click on the *Data* ribbon at the top, you should see a *Data Analysis* button in a sub-section called *Analyze*.

Before performing the F-test, I advise you calculate the variance for each group beforehand – you will see why this is useful shortly.

To calculate the variance, in a new cell, use the VAR.S function.

=VAR.S(number1, [number2],…)

**Number1**– Range of cells containing the first group data**Number2**– Range of cells containing the second group data

Here are the formulas I used with my example data.

So, for my example data, the variance for males was 23.55 cm^{2} and the variance for females was 25.70 cm^{2}.

Now we are ready to perform the F-test to determine if the two variances are significantly different.

To perform the F-test, go to *Data*>*Data Analysis*.

Then from the list, select the *F Test Two-Sample for Variances* option and click *OK*.

Here is a breakdown of each option.

**Variable 1 Range**– The range of cells containing the first group data. For Variable 1, it is recommended to input the group with the highest variance value so Excel can calculate the correct F value. This is the reason why we calculated the variance for each group beforehand! For my example, I will use the Female group since they had a larger variance compared with the Male group**Variable 2 Range**– The range of cells containing the second group data. For my example, I will enter the Male group data**Labels**– Select this if you have highlighted the group label- Alpha – This is the significance threshold. By default this is set at 0.05. So, if the p<0.05, we can reject the null hypothesis and conclude that the test is statistically significant

The output options underneath are used to determine where the F-test results will be returned.

**Output Range**– This option will allow you to select an area in the current sheet where you want to results to be placed**New Worksheet Ply**– This will save the results in a new sheet**New Workbook**– This option will save the results in an entirely new Excel document

For this example, I’ll select the second option and call the new sheet *Results*.

Finally, press the *OK* button to run the F-test.

The F-test results should now be displayed in a table.

Here’s what my results table looks like.

I’ll now break down each of these results in detail.

**Mean**– This is the mean, or average, values for each of the two groups**Variance**– This is the same values as we calculated just before performing the test**Observations**– The number of data points in each group; so I had 8 participants in each group**df**– The degrees of freedom. This is calculated by performing observations – 1**F**– The F-value for the test. This is calculated by dividing the variance for the first variable, by the variance of the second variable**P(F<=f) one-tail**– The p-value for the F-test. Note, this is a one-tail analysis**F Critical one-tail**– The F Critical value for the test. Note, this is a one-tail analysis

Basically, Excel uses the degrees of freedom (df) for the two groups, as well as the previously selected alpha level, to work out the F Critical value.

By comparing the F-value to the F Critical value, we can determine the p-value.

For my example, p=0.46.

To interpret this value, we need to declare our two hypotheses.

**Null hypothesis**– There is no difference in variance of height measures between the two populations**Alternative hypothesis**– Females have a higher variance for height measures compared with males

Note, that the alternative hypothesis is written in this way because the test performed here is a one-tailed test; I will show you how to calculate a p-value for a two-tailed test shortly.

If p<0.05, we would reject the null hypothesis and accept the alternative hypothesis.

On the other hand, we would fail to reject the null hypothesis if P>0.05.

Since my p-value is larger than 0.05, I will fail to reject the null hypothesis and conclude that there is no difference in variance of height measures between the two populations.

If this was the case, I could proceed with a T-test and assume the populations have equal variance.

If my p-value was less than 0.05, then I would assume the two populations have unequal variance when performing the T-test.

There is also a function you can use that will return the two-tailed p-value for the test, instead of the one-tailed p-value created through the Data Analysis ToolPak.

To do this, use the F.TEST function

=F.TEST(array1, array2)

**Array1**– Range of cells containing the first group data**Array2**– Range of cells containing the second group data

Notice that this p-value is different to that reported by the Data Analysis ToolPak; infact, it is twice the value.

This is because the F.TEST function returns the two-tailed p-value, whereas the Data Analysis ToolPak only returns the one-tailed p-value.

To interpret this p-value we again need to report our two hypotheses.

**Null hypothesis**– There is no difference in variance of height measures between the two populations**Alternative hypothesis**– There is a difference in variance of height measures between the two populations

Since the p-value is greater than my alpha of 0.05, we fail to reject the null hypothesis and conclude again that there is no difference in variance of height measures between the two populations.

In this tutorial, I have shown you how to perform a two-sample F-test for variances in Microsoft Excel.

The one-tailed F-test can be performed by using the Data Analysis ToolPak. To perform a two-tailed F-test, use the F.TEST function to return the two-tailed p-value.

Microsoft Excel version used: 365 ProPlus

The post How To Perform A Two-Sample F-Test In Excel (Variance Test) appeared first on Top Tip Bio.

]]>