Pandas.cut() method in Python
Last Updated :
07 Apr, 2025
The cut() function in Pandas is used to divide or group numerical data into different categories (called bins). This is helpful when we have a list of numbers and want to separate them into meaningful groups.
Sometimes, instead of working with exact numbers, we want to group them into ranges. For example, suppose we have students' marks data, instead of listing every score, we might want to categorize them into "Low", "Average", and "High":
Python
import pandas as pd
d = {'Student': ['Aryan', 'Prajjwal', 'Vishakshi', 'Brijkant', 'Kareena'],
'Marks': [77, 72, 19, 68, 45]}
df = pd.DataFrame(d)
bins = [0, 50, 75, 100] # Ranges: 0-50, 51-75, 76-100
lab = ['Low', 'Average', 'High']
# Step 3: Use cut() to categorize the marks
df['Category'] = pd.cut(df['Marks'], bins=bins, labels=lab, include_lowest=True)
print(df)
Output Student Marks Category
0 Aryan 77 High
1 Prajjwal 72 Average
2 Vishakshi 19 Low
3 Brijkant 68 Average
4 Kareena 45 Low
This process is called binning, and it helps in data analysis by making large sets of numbers easier to understand and compare.
Syntax
pd.cut(x, bins, right=True, labels=None, retbins=False, precision=3, include_lowest=False, duplicates="raise")
Parameters:
- x: The 1D input array to be binned.
- bins: Defines the bin edges for segmentation.
- right (default: True): If True, bins include the rightmost edge.
- labels: Assigns labels to bins. If False, only integer indicators are returned.
- retbins (default: False): If True, returns the bin edges.
Return Type:
1. When applied to a Pandas Series (DataFrame column), it returns a pandas.Series with categorized bins.
2. When applied to a NumPy array or list, it returns a numpy.ndarray of categorized bins.
3. If retbins=True is used, it returns a tuple:
- First element: A Series or array with categorized values.
- Second element: The array of bin edges.
Examples of .cut() method:
Example 1: Categorizing Random Numbers into Bins
Let's create an array of 10 random numbers from 1 to 100 and categorize them into 5 bins:
Python
import pandas as pd
import numpy as np
# Creating a DataFrame with random numbers
df = pd.DataFrame({'number': np.random.randint(1, 100, 10)})
# Using cut() to categorize numbers into 5 bins
df['bins'] = pd.cut(df['number'], bins=[1, 20, 40, 60, 80, 100])
print(df)
# Checking unique bins
print(df['bins'].unique())
Output number bins
0 1 NaN
1 83 (80.0, 100.0]
2 33 (20.0, 40.0]
3 11 (1.0, 20.0]
4 32 (20.0, 40.0]
5 6 (1.0, 20.0]
6 9 (1.0, 20.0]
...
Explanation:
- The numbers are assigned to bins (1,20], (20,40], etc.
- cut() function automatically determines which bin each number belongs to.
Example 2: Adding Labels to Bins
We can also assign labels to our bins to make the output more readable:
Python
import pandas as pd
import numpy as np
df = pd.DataFrame({'number': np.random.randint(1, 100, 10)})
# Categorizing numbers with labels
df['bins'] = pd.cut(df['number'], bins=[1, 20, 40, 60, 80, 100],
labels=['1 to 20', '21 to 40', '41 to 60', '61 to 80', '81 to 100'])
print(df)
# Checking unique bins
print(df['bins'].unique())
Output number bins
0 55 41 to 60
1 8 1 to 20
2 51 41 to 60
3 26 21 to 40
4 5 1 to 20
5 7 1 to 20
6 48 41 to 60
7 50 41 to 60
8 37 ...
Explanation:
- Instead of bin ranges (1,20], we now see labels like '1 to 20', '41 to 60', etc.
- This improves readability and makes it easier to analyze categorized data.
Example 2: Applying pd.cut() to a NumPy Array
Python
import numpy as np
import pandas as pd
n = np.array([10, 25, 45, 68, 90])
b_res = pd.cut(n, bins=[0, 20, 50, 100])
print(b_res)
print(type(b_res))
Output[(0, 20], (20, 50], (20, 50], (50, 100], (50, 100]]
Categories (3, interval[int64, right]): [(0, 20] < (20, 50] < (50, 100]]
<class 'pandas.core.arrays.categorical.Categorical'>
The result is a NumPy array with categorized values.
Similar Reads
Python | Pandas Index.delete()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas Index.delete() function returns a new object with the passed locations deleted.
2 min read
Python | Pandas Index.min()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas Index.min() function returns the minimum value of the Index. The function works
2 min read
Python | Pandas Index.drop()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas Index.drop() function make new Index with passed list of labels deleted. The fu
2 min read
Python | Pandas dataframe.clip()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.clip() is used to trim values at specified input threshold. We can us
3 min read
Python PIL | Image.crop() method
PIL is the Python Imaging Library which provides the python interpreter with image editing capabilities. PIL.Image.crop() method is used to crop a rectangular portion of any image. Syntax: PIL.Image.crop(box = None)Parameters: box - a 4-tuple defining the left, upper, right, and lower pixel coordin
2 min read
Python | os.ftruncate() method
OS module in Python provides functions for interacting with the operating system. OS comes under Pythonâs standard utility modules. This module provides a portable way of using operating system dependent functionality. os.ftruncate() method in Python is used to truncate the file corresponding to the
4 min read
Python | Pandas dataframe.div()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.Pandas dataframe.div() is used to find the floating division of the dataframe and other
3 min read
Python | Numpy np.lagtrim() method
With the help of np.lagtrim() method, we can trim the small trailing values of the polynomial by using np.lagtrim() method. Syntax : np.lagtrim(c, tol) Return : Return the trimmed values from the polynomial. Example #1 : In this example we can see that by using np.lagtrim() method, we are able to ge
1 min read
Python | Pandas Series.mod()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Python Series.mod() is used to return the remainder after division of two numbers Synt
3 min read
Python | Pandas dataframe.get()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas dataframe.get() function is used to get item from object for given key. The key
2 min read