 # Detecting Multicollinearity with VIF – Python

## Detecting Multicollinearity with VIF – Python

Multicollinearity occurs when there are two or more independent variables in a multiple regression model, which have a high correlation among themselves. When some features are highly correlated, we might have difficulty in distinguishing between their individual effects on the dependent variable. Multicollinearity can be detected using various techniques, one such technique being the Variance Inflation Factor(VIF).

In VIF method, we pick each feature and regress it against all of the other features. For each regression, the factor is calculated as : Where, R-squared is the coefficient of determination in linear regression. Its value lies between 0 and 1.

As we see from the formula, greater the value of R-squared, greater is the VIF. Hence, greater VIF denotes greater correlation. This is in agreement with the fact that a higher R-squared value denotes a stronger collinearity. Generally, a VIF above 5 indicates a high multicollinearity.

Implementing VIF using statsmodels:

statsmodels provides a function named variance_inflation_factor() for calculating VIF.

Syntax : statsmodels.stats.outliers_influence.variance_inflation_factor(exog, exog_idx)

Parameters :

• exog : an array containing features on which linear regression is performed.
• exog_idx : index of the additional feature whose influence on the other features is to be measured.

Let us see an example to implement the method on this dataset.

The dataset :

The dataset used in the example below, contains the height, weight, gender and Body Mass Index for 500 persons. Here the dependent variable is Index.

 import pandas as pd    # the dataset data = pd.read_csv('BMI.csv')   # printing first few rows print(data.head())

Output :

   Gender  Height  Weight  Index
0    Male     174      96      4
1    Male     189      87      2
2  Female     185     110      4
3  Female     195     104      3
4    Male     149      61      3

Approach :

• Each of the feature indices are passed to variance_inflation_factor() to find the corresponding VIF.
• These values are stored in the form of a Pandas DataFrame.
 from statsmodels.stats.outliers_influence import variance_inflation_factor   # creating dummies for gender data['Gender'] = data['Gender'].map({'Male':0, 'Female':1})   # the independent variables set X = data[['Gender', 'Height', 'Weight']]   # VIF dataframe vif_data = pd.DataFrame() vif_data["feature"] = X.columns   # calculating VIF for each feature vif_data["VIF"] = [variance_inflation_factor(X.values, i)                           for i in range(len(X.columns))]   print(vif_data)

Output :

  feature        VIF
0  Gender   2.028864
1  Height  11.623103
2  Weight  10.688377

As we can see, height and weight have very high values of VIF, indicating that these two variables are highly correlated. This is expected as the height of a person does influence their weight. Hence, considering these two features together leads to a model with high multicollinearity.

Last Updated on March 1, 2022 by admin

## Triple Quotes in PythonTriple Quotes in Python

Triple Quotes in Python Spanning strings over multiple lines can be done using python’s triple

## Get directory of current Python scriptGet directory of current Python script

Get directory of current Python script While working with file handling you might have noticed that files

## Opencv Python program for Face DetectionOpencv Python program for Face Detection

Opencv Python program for Face Detection The objective of the program given is to detect

## Data type Object (dtype) in NumPy PythonData type Object (dtype) in NumPy Python

Data type Object (dtype) in NumPy Python Every ndarray has an associated data type (dtype)

## numpy.sign() in Pythonnumpy.sign() in Python

numpy.sign() in Python numpy.sign(array [, out]) function is used to indicate the sign of a number

## Inner Class in PythonInner Class in Python

Inner Class in Python A Python is an Object-Oriented Programming Language, everything in python is related to

## Python IndentationPython Indentation

Indentation in Python Indentation is a very important concept of Python because without proper indenting

## Textwrap – Text wrapping and filling in PythonTextwrap – Text wrapping and filling in Python

Textwrap – Text wrapping and filling in Python The textwrap module can be used for