Phi Coefficient

The StatsTest Flow: Relationship >> Two Categorical >> Two Values per Variable

Not sure this is the right statistical method? Use the Choose Your StatsTest workflow to select the right method.


What is the Phi Coefficient?

The Phi Coefficient is used to understand the strength of the relationship between two variables. To use it, your variables of interest should be binary. See more below.

The Phi Coefficient can be used to determine the strength of the relationship between two binary variables.

The Phi Coefficient is also called the mean square contingency coefficient.


Assumptions for the Phi Coefficient

Every statistical method has assumptions. Assumptions mean that your data must satisfy certain properties in order for statistical method results to be accurate.

The assumptions for the Phi Coefficient include:

  1. Binary variables

Let’s dive into what that means.

Binary

For this test, your two variables must be binary. Binary means that your variable is a category with only two possible values. Some good examples of binary variables include gender (male/female) or any True/False or Yes/No variable.


When to use the Phi Coefficient?

You should use the Phi Coefficient in the following scenario:

  1. You want to know the relationship between two variables
  2. Your variables of interest are binary
  3. You have only two variables

Let’s clarify these to help you know when to use the Phi Coefficient.

Relationship

You are looking for a statistical test to look at how two variables are related. Other types of analyses include testing for a difference between two variables or predicting one variable using another variable (prediction).

Binary

For this test, your two variables must be binary. Binary means that your variable is a category with only two possible values. Some good examples of binary variables include gender (male/female) or any True/False or Yes/No variable.

If your data are continuous, you may want to use Pearson Correlation. If one of your variables is continuous and the other is binary, you should use Point Biserial Correlation. And if your variables have more than two categories, you should use Cramer’s V.

Two Variables

The Phi Coefficient can only be used to compare two variables.


Phi Coefficient Example

Variable 1: Gender
Variable 2: Heart Disease Diagnosis

In this example, we are interested in investigating the relationship between gender and heart disease. To begin, we collect these data from a group of people.

Because both of these variables are binary with only two possible values per variable (male/female, yes/no), we know that the Phi Coefficient is a suitable test.

The analysis will result in a Phi Coefficient and a p-value. Phi values range from -1 to 1. A negative value of Phi indicates that the variables are inversely related, or when one variable increases, the other decreases. On the other hand, positive values indicate that when one variable increases, so does the other.

The p-value represents the chance of seeing our results if there was no actual relationship between our variables. A p-value less than or equal to 0.05 means that our result is statistically significant and we can trust that the difference is not due to chance alone.

Frequently Asked Questions

Q: How do I get the Phi Coefficient in SPSS or R?
A: StatsTest is focused on helping you pick the right statistical method every time. There are many resources available to help you figure out how to run this method with your data:
SPSS article: http://www.pmean.com/definitions/phi.htm
SPSS video: https://www.youtube.com/watch?v=HqQzeOmPl0o
R article: https://www.rdocumentation.org/packages/psych/versions/1.0-17/topics/phi

Help!

If you still can’t figure something out, feel free to reach out.

css.php