# Finding the Quantile and Decile Ranks of a Pandas DataFrame column

## Finding the Quantile and Decile Ranks of a Pandas DataFrame column

A Quantile is where a sample is divided into equal-sized, adjacent, subgroups.

The median is a quantile; the median is placed in a probability distribution so that exactly half of the data is lower than the median and half of the data is above the median. The median cuts a distribution into two equal areas and so it is sometimes called 2-quantile.

Quartiles are also quantiles; they divide the distribution into four equal parts.

Percentiles are quantiles that divide a distribution into 100 equal parts and deciles are quantiles that divide a distribution into 10 equal parts.

We can use the following formula to estimate the ith observation:

`ith observation = q (n + 1)`

where q is the quantile, the proportion below the ith value that you are looking for

n is the number of items in a data set.

So for finding Quantile rank, q should be 0.25 as we want to divide our data set into 4 equal parts and rank the values from 0-3 based on which quartile they fall upon.

And similarly for Decile rank, q should be 0.1 as we want our data set to be divided into 10 equal parts.

Before moving to Pandas, lets us try the above concept on an example to understand how our Quantile and Decile Ranks are calculated.

Sample question : Find the number in the following set of data where 25 percent of values fall below it, and 75 percent fall above.
Data :  32, 47, 55, 62, 74, 77, 86

Step 1: Order the data from smallest to largest. The data in the question is already in ascending order.

Step 2: Count how many observations you have in your data set. this particular data set has 7 items.

Step 3: Convert any percentage to a decimal for “q”. We are looking for the number where 25 percent of the values fall below it, so convert that to .25.

Step 4: Insert your values into the formula:

ith observation = q (n + 1)

ith observation = .25(7 + 1) = 2

The ith observation is at 2. The 2nd number in the set is 47, which is the number where 25 percent of the values fall below it. And then we can start ranking our numbers from 0-3 since we are finding Quantile Rank. Similar approach for finding Decile Rank, in this case its just that the value of q will be 0.1.

Now let us now look in Pandas how we can quickly achieve the same.

Code for Creating a DataFrame:

 `# Import pandas` `import` `pandas as pd` `# Create a DataFrame` `df1 ``=` `{``'Name'``:[``'George'``, ``'Andrea'``, ``'John'``, ``'Helen'``,` `             ``'Ravi'``, ``'Julia'``, ``'Justin'``],` `       ``'EnglishScore'``:[``62``, ``47``, ``55``, ``74``, ``32``, ``77``, ``86``]}` ` ` `df1 ``=` `pd.DataFrame(df1, columns ``=` `[``'Name'``, ''])` `# Sorting the DataFrame in Ascending Order of English Score` `df1.sort_values(by ``=``[``'EnglishScore'``], inplace ``=` `True``)`

If we print the above dataframe we get the below result :

Data Frame

Now we can find the Quantile Rank using the pandas function qcut() by passing the column name which is to be considered for the Rank, the value for parameter which signifies the Number of quantiles. 10 for deciles, 4 for quartiles, etc.  and labels = False to return the bins as Integers.

Following is code for Quantile Rank

 `# code` `df1[``'QuantileRank'``]``=` `pd.qcut(df1[``'EnglishScore'``],` `                             ``q ``=` `4``, labels ``=` `False``)`

And now if we print the dataframe we can see the new column QauntileRank ranking our data based on EnglishScore column.

Quantile Rank

Similarly to calculate Decile Rank we set q = 10

 `# code` `df1[``'DecileRank'``]``=` `pd.qcut(df1[``'EnglishScore'``],` `                           ``q ``=` `10``, labels ``=` `False``)`

Now if we print our DataFrame we get the following output.

DecileRank

That is how we can use the Pandas qcut() method to calculate the various Quantiles on a column.

The entire code for the above example is given below.

 `# code` `import` `pandas as pd` `# Create a DataFrame` `df1 ``=` `{``'Name'``:[``'George'``, ``'Andrea'``, ``'John'``, ``'Helen'``,` `               ``'Ravi'``, ``'Julia'``, ``'Justin'``],` `       ``'EnglishScore'``:[``62``, ``47``, ``55``, ``74``, ``32``, ``77``, ``86``]}` ` ` `df1 ``=` `pd.DataFrame(df1, columns ``=``[``'Name'``, ``'EnglishScore'``])` `# Sorting the DataFrame in Ascending Order of English Score` `# Sorting just for the purpose of better data readability.` `df1.sort_values(by ``=``[``'EnglishScore'``], inplace ``=` `True``)` `# Calculating Quantile Rank` `df1[``'QuantileRank'``]``=` `pd.qcut(df1[``'EnglishScore'``], q ``=` `4``, labels ``=` `False``)` `# Calculating Decile Rank` `df1[``'DecileRank'``] ``=` `pd.qcut(df1[``'EnglishScore'``], q ``=` `10``, labels ``=` `False``)` `# printing the datafarame` `print``(df1)`

Last Updated on October 19, 2021 by admin

## Loop or Iterate over all or certain columns of a dataframe in Python – PandasLoop or Iterate over all or certain columns of a dataframe in Python – Pandas

Loop or Iterate over all or certain columns of a dataframe in Python-Pandas In this

## Python | Pandas Index.unique()Python | Pandas Index.unique()

Python | Pandas Index.unique() Python is a great language for doing data analysis, primarily because

## How to Apply a function to multiple columns in Pandas?How to Apply a function to multiple columns in Pandas?

How to Apply a function to multiple columns in Pandas?   Let us see how

## Python | Pandas Index.value_counts()Python | Pandas Index.value_counts()

Pandas Index.value_counts() function returns object containing counts of unique values. The resulting object will be

## Replace values in Pandas dataframe using regexReplace values in Pandas dataframe using regex

Replace values in Pandas dataframe using regex   While working with large sets of data,

## Write custom aggregation function in PandasWrite custom aggregation function in Pandas

Write custom aggregation function in Pandas PandasĀ in python in widely used for Data Analysis purpose

## How to extract the value names and counts from value_counts() in Pandas ?How to extract the value names and counts from value_counts() in Pandas ?

How to extract the value names and counts from value_counts() in Pandas ? In this

## Get column index from column name of a given Pandas DataFrameGet column index from column name of a given Pandas DataFrame

Get column index from column name of a given Pandas DataFrame   In this article