What Is T Test? What Are The Types Of T Test? How Does It Works?
“Being a product leader it’s your responsibility to verify your assumption/hypothesis based on data analysis. Assume less & verify more has to be the way to execute your responsibilities. “
We have covered earlier in our Part1 & Part 2 Of Inferential statistics: Hypothesis Testing, where we understood about,
Normal Deviate Z Test:
As promised in my last article on hypothesis testing, we will today cover in detail about one another popular hypothesis testing called: T Test. We will cover this journey in following manner
- What Is T-Test?
- What Are The Types Of T-Tests?
- Calculating T-Test Statistics
- One Sample T-Test With Example
Excited ! Let’s Get Started & Once again thanks for being a part of this journey.
What Is T Test?
It is a parametric test which tells you how significant the differences between groups are; In other words, it lets you know if those differences (measured in means/averages) could have happened by chance.
T-tests are called so, because the test results are all based on t-values.
A t-test looks at the t-statistic, the t-distribution values, and the degrees of freedom to determine the probability of difference between two sets of data.
T values : T test Statistics
T-values are an example of test statistics. A test statistic is a standardized value that is calculated from sample data during a hypothesis test. The procedure that calculates the test statistic compares your data to what is expected under the null hypothesis.
To perform a t-test calculation we require three key data values.
- The difference between the mean values from each data set (called the mean difference),
- The standard deviation of each group
- The number of data values of each group.
As per Investopedia:
The Denominator in t value measures, how the data is the dispersion or variability.
Higher values of the t-value, also called t-score, indicate that a large difference exists between the two sample sets
Degrees of Freedom In T Test:
Degrees of freedom refers to the values in a study that has the freedom to vary and are essential for assessing the importance and the validity of the null hypothesis. Computation of these values usually depends upon the number of data records available in the sample set.
- The first assumption is concerned with the scale of measurement. Here assumption for a t-test is that the scale of measurement applied to the data collected follows a continuous or ordinal scale.
- The second assumption is regarding a simple random sample. The Assumption is that the data is collected from a representative, randomly selected portion of the total population.
- The third assumption is the data, when plotted, results in a normal distribution, bell-shaped distribution curve.
- The fourth assumption is a that reasonably large sample size is used for the test. Larger sample size means the distribution of results should approach a normal bell-shaped curve.
- The final assumption is the homogeneity of variance. Homogeneous, or equal, variance exists when the standard deviations of samples are approximately equal.
What Are The Types Of T Tests & How To Select One Of Them?
There are three types of t-test:
- Used to compare a sample mean with a known population mean or some other meaningful, fixed value.
Independent samples t-test
- Used to compare two means from independent groups
Paired samples t-test
- Used to compare two means that are repeated measures for the same participants — scores might be repeated across different measures or across time.
- Used also to compare paired samples, as in a two treatment randomized block design.
How To Decide Which T test Type To Select For Your Test?
Below given flowchart can be used to determine which t-test should be used based on the characteristics of the sample sets. You need to analyze whether the sample records are similar, the number of data records in each sample set, and the variance of each sample set, to make T test type selection.
One Sample T Test: (Manually)
We will perform 1 Sample T Test in following steps
- State Hypothesis:
- Compute T Test Statistics
- Compute Critical value using T table (For Two Tailed Test We need to find the critical cut-off value)
- Evaluate null hypothesis
Let’s try to understand the concept using one simple problem :
Example Problem : Source
Q. A coffee shop relocates to Italy and wants to make sure that all lattes are consistent. They believe that each latte has an average of 4 Oz of espresso. If this is not the case, they must increase or decrease the amount. A random sample of 25 lattes shows a mean of 4.6 Oz of espresso and a standard deviation of .22 Oz. Use alpha = .05 and run a one-sample t-test to compare with the known population mean.
Solution: Two Tailed One sample T Test:
1. Let’s State Hypothesis:
Null Hypothesis H0: There is no significant difference between sample Mean (M )of espresso in latte and population means μ
M= μ=4 oz.
Alternative Hypothesis HA: There is significant difference between sample Mean, M and population means μ. So the sample mean of espresso in latte is not equal to 4 oz.
The purpose of the one sample t-test is to determine if the null hypothesis should be rejected, given the sample data. Here we will perform two tailed (or two sided) one sample t test.
Remember! We are performing a two tailed t test here because we are not trying to find whether the mean value is less than or greater than any given value.
2. Compute T test Statistics:
As see in fig 1.0 above,
T test = (sample mean — population mean)/(stddev/sqrt(n))
Here are the known values given in the example:
Sample Mean M = 4.6 oz
Population Mean μ = 4 oz.
Sample standard deviation = 0.22 oz.
Sample size n = 25
So we can calculate,
Degree Of Freedom, df = Sample size -1 = 25–1 = 24.
So if we replace all the known values in t-test formula
T test = (4.6–4)/(0.22/sqrt(25)) = 13.6
So our t test value comes out to be 13.6.
3. Compute Critical Value For Two Tailed One Sample Test:
Know that we know the t test value = 13.6
Degree of freedom df = 24,
given, alpha value which is .05,
We can refer to T table to find the critical values(tc) for the two tailed test.
In below given t table, go df row no 24 (as df =24), navigate to t.975 column where we have alpha value of 0.05 for 2 tailed test, you will get the cut off value to be
tc = +/- 2.064,
Which is a cutt-off point for rejecting the null hypothesis.
4. Accepting / Rejecting our Null Hypothesis: Using The region of acceptance method.
Since our t-test result is 13.6 which is either bigger than 2.064 or less than -2.064, comes out to be in the rejection region being > 2.064, so we can conclude that there is a significant difference between our sample mean of the amount of espresso in the coffee in Italy and the expected population amount.
Hence we dis-agree with our Null hypothesis which stated that there is no significant difference between sample Mean (M )of espresso in latte and population means μ
So we reject Ho here & accept HA, our alternative hypothesis.
Therefore, we can easily say that there is too much espresso being placed in the coffee in Italy and it should be reduced to meet the normal (population) mean.
One Sample T Test Using Python & Jupyter Notebook:
We will solve the same example using jupyter notebook & python functions/packages:
Stating Hypothesis & Calculating T test Stat:
Go to anaconda by clicking the link below & download it to get started. It will have everything right from python, to its associated packages and jupyter notebookHome – Anaconda
Latest news: Anaconda named a May 2019 Gartner Peer Insights Customers’ Choice for Data Science and Machine Learning…www.anaconda.com
#Write the code snippet given below in your notebook
#H0: The mean of espresso in latte is not different from population #Mean and is = 4 oz.
#Ha: The mean of espresso in latte is significantly different from #population Mean and is not= 4 oz.
#import numpy as np
sample_mean = 4.6
population_mean = 4
sample_size = 25
degree_of_freedom = 25-1
standard_deviation = 0.22
print('Degree Of Freedom is : ', degree_of_freedom)
As you can see above , I have defined null & alternative hypothesis and calculated T test using python notebook , the output will look like the image given below :
Here, T test comes out to be = 13.6(which is similar to what we calculated manually earlier)
Calculating P- Value:
Write the below given code in your notebook:
#We will uise scipy stats function to calculate p value as shown #below
import scipy.stats as stats
p = stats.t.cdf(Ttest, df = 24)
pvalue = stats.t.sf(np.abs(Ttest), 24)*2
print("p is:", p)
print("pvalue is:", pvalue)
#Since We are doing two sided test to find the final p-value
pval = (1 - p)*2
if pvalue > 0.05:
print('The mean of espresso in latte is not different from population Mean (Fail To Reject H0)')
print('The mean of espresso in latte is significantly different from population Mean (Reject H0) ')
When you run the code you will get the following outcome:
Accepting / Rejecting our Null Hypothesis: Using P Value
As per our calculation(shown above), pvalue is: 8.48542925590331e-13, if you convert this value online using scientific notation to decimal convertor, it will come out to be
pvalue= 0.000000000000848542925590331 .
Which comes out to be < 0.05(In the majority of analyses, an alpha of 0.05 is used & accepted as the cutoff for significance by data science engineers) Hence we can reject our null hypothesis H0, to conclude that :
The mean of espresso in latte is not different from population Mean and thus we reject H0.
Independent Sample T Test:
What Is Independent T-test?
Independent T-test, two sample t-test, independent-samples t-test or student’s t-test, is a type inferential statistical test which determines :
Whether there is a statistically significant difference between the means in two unrelated groups.
Null and alternative hypotheses for the independent t-test
The null hypothesis Ho:
The population means from the two unrelated groups are equal
H0: u1 = u2
The population means are not equal:
HA: u1 ≠ u2
Significance level (also called alpha): Which will be used to either reject or accept the alternative hypothesis is generally set at 0.05.
Pre-Requisites To Conduct Independent t-test:
- One independent, categorical variable that has two levels/groups.
- One continuous dependent variable.
We will cover other important caveats of Two sample / Independent t-test with examples in our next part of hypothesis testing in inferential statistics. “Hypothesis Testing Using T Test : Inferential Statistics Part4”
Let me leave you all with a pictorial summary of One Sample T Test :
Always Remember one key concept that :
While performing the hypothesis test, an individual may commit the following types of error:
- Type-I Error: True Null hypothesis is rejected, i.e. The hypothesis is rejected when it should be accepted. The probability of committing the type-I error is denoted by α and is called as a level of significance.
- If, α = Pr[type-I error] = Pr [reject H0/H0 is true]
Then, (1-α) = Pr[accept H0/H0 is true]
(1-α) = corresponds to the concept of Confidence Interval.
- Type-II Error: A False Null hypothesis is accepted, i.e. The hypothesis is accepted when it should be rejected. The probability of committing the type-II error is denoted by β.
- If, β = Pr[type-II error] = Pr[accept H0/H0 is false]
Then, (1-β) = Pr[reject Ho/H0 is false
(1-β) = power of a statistical test.
We will cover this concept in detail later. Till then Keep learning & keep posting your queries, I will be happy to assist .