# Detecting Multicollinearity with VIF – Python

## Detecting Multicollinearity with VIF – Python

Multicollinearity occurs when there are two or more independent variables in a multiple regression model, which have a high correlation among themselves. When some features are highly correlated, we might have difficulty in distinguishing between their individual effects on the dependent variable. Multicollinearity can be detected using various techniques, one such technique being the Variance Inflation Factor(VIF).

In VIF method, we pick each feature and regress it against all of the other features. For each regression, the factor is calculated as :

Where, R-squared is the coefficient of determination in linear regression. Its value lies between 0 and 1.

As we see from the formula, greater the value of R-squared, greater is the VIF. Hence, greater VIF denotes greater correlation. This is in agreement with the fact that a higher R-squared value denotes a stronger collinearity. Generally, a VIF above 5 indicates a high multicollinearity.

Implementing VIF using statsmodels:

statsmodels provides a function named variance_inflation_factor() for calculating VIF.

Syntax : statsmodels.stats.outliers_influence.variance_inflation_factor(exog, exog_idx)

Parameters :

• exog : an array containing features on which linear regression is performed.
• exog_idx : index of the additional feature whose influence on the other features is to be measured.

Let us see an example to implement the method on this dataset.

The dataset :

The dataset used in the example below, contains the height, weight, gender and Body Mass Index for 500 persons. Here the dependent variable is Index.

 import pandas as pd    # the dataset data = pd.read_csv('BMI.csv')   # printing first few rows print(data.head())

Output :

   Gender  Height  Weight  Index
0    Male     174      96      4
1    Male     189      87      2
2  Female     185     110      4
3  Female     195     104      3
4    Male     149      61      3

Approach :

• Each of the feature indices are passed to variance_inflation_factor() to find the corresponding VIF.
• These values are stored in the form of a Pandas DataFrame.
 from statsmodels.stats.outliers_influence import variance_inflation_factor   # creating dummies for gender data['Gender'] = data['Gender'].map({'Male':0, 'Female':1})   # the independent variables set X = data[['Gender', 'Height', 'Weight']]   # VIF dataframe vif_data = pd.DataFrame() vif_data["feature"] = X.columns   # calculating VIF for each feature vif_data["VIF"] = [variance_inflation_factor(X.values, i)                           for i in range(len(X.columns))]   print(vif_data)

Output :

  feature        VIF
0  Gender   2.028864
1  Height  11.623103
2  Weight  10.688377

As we can see, height and weight have very high values of VIF, indicating that these two variables are highly correlated. This is expected as the height of a person does influence their weight. Hence, considering these two features together leads to a model with high multicollinearity.

Last Updated on March 1, 2022 by admin

## Python – Merge two lists into list of tuplesPython – Merge two lists into list of tuples

Python | Merge two lists into list of tuples Given two lists, write a Python

## Python program to count number of vowels using sets in given stringPython program to count number of vowels using sets in given string

Python program to count number of vowels using sets in given string Given a string,

## Python – Sentiment Analysis using VADERPython – Sentiment Analysis using VADER

Python | Sentiment Analysis using VADER Sentiment Analysis is the process of ‘computationally’ determining whether a

## Python – Convert list into list of listsPython – Convert list into list of lists

Python | Convert list into list of lists Given a list of strings, write a

## Matplotlib.pyplot.savefig() in PythonMatplotlib.pyplot.savefig() in Python

Matplotlib.pyplot.savefig() in Python Matplotlib is highly useful visualization library in Python. It is a multi-platform data

## Python bytearray() functionPython bytearray() function

Python | bytearray() function bytearray() method returns a bytearray object which is an array of given

## Simple registration form using Python TkinterSimple registration form using Python Tkinter

Simple registration form using Python Tkinter Python provides the Tkinter toolkit to develop GUI applications.

## sys.stdout.write in Pythonsys.stdout.write in Python

sys.stdout.write in Python This is a built-in Python module that contains parameters specific to the