How to Order PysPark DataFrame by Multiple Columns ? Last Updated : 17 Jun, 2021 Comments Improve Suggest changes Like Article Like Report In this article, we are going to order the multiple columns by using orderBy() functions in pyspark dataframe. Ordering the rows means arranging the rows in ascending or descending order, so we are going to create the dataframe using nested list and get the distinct data. orderBy() function that sorts one or more columns. By default, it orders by ascending. Syntax: orderBy(*cols, ascending=True) Parameters: cols: Columns by which sorting is needed to be performed.ascending: Boolean value to say that sorting is to be done in ascending order Example Program to create dataframe with student data as information: Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # list of students data data =[["1","sravan","vignan"], ["2","ojaswi","vvit"], ["3","rohith","vvit"], ["4","sridevi","vignan"], ["1","sravan","vignan"], ["5","gnanesh","iit"]] # specify column names columns=['student ID','student NAME','college'] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data,columns) print("Actual data in dataframe") # show dataframe dataframe.show() Output: Actual data in dataframe +----------+------------+-------+ |student ID|student NAME|college| +----------+------------+-------+ | 1| sravan| vignan| | 2| ojaswi| vvit| | 3| rohith| vvit| | 4| sridevi| vignan| | 1| sravan| vignan| | 5| gnanesh| iit| +----------+------------+-------+ Example 1: Python program to show dataframe by sorting the dataframe based on two columns in descending order using orderby() function Python3 # show dataframe by sorting the dataframe based # on two columns in descending order using orderby() function dataframe.orderBy(['student ID','student NAME'], ascending=False).show() Output: +----------+------------+-------+ |student ID|student NAME|college| +----------+------------+-------+ | 5| gnanesh| iit| | 4| sridevi| vignan| | 3| rohith| vvit| | 2| ojaswi| vvit| | 1| sravan| vignan| | 1| sravan| vignan| +----------+------------+-------+ Example 2: Python program to show dataframe by sorting the dataframe based on two columns in ascending order using orderby() function Python3 # show dataframe by sorting the dataframe # based on two columns in ascending order # using orderby() function dataframe.orderBy(['student ID','student NAME'], ascending=True).show() Output: +----------+------------+-------+ |student ID|student NAME|college| +----------+------------+-------+ | 1| sravan| vignan| | 1| sravan| vignan| | 2| ojaswi| vvit| | 3| rohith| vvit| | 4| sridevi| vignan| | 5| gnanesh| iit| +----------+------------+-------+ Comment More infoAdvertise with us Next Article How to Order PysPark DataFrame by Multiple Columns ? gottumukkalabobby Follow Improve Article Tags : Python Python-Pyspark Practice Tags : python Similar Reads How to Order Pyspark dataframe by list of columns ? In this article, we are going to apply OrderBy with multiple columns over pyspark dataframe in Python. Ordering the rows means arranging the rows in ascending or descending order. Method 1: Using OrderBy() OrderBy() function is used to sort an object by its index value. Syntax: dataframe.orderBy([' 2 min read How to select and order multiple columns in Pyspark DataFrame ? In this article, we will discuss how to select and order multiple columns from a dataframe using pyspark in Python. For this, we are using sort() and orderBy() functions along with select() function. Methods UsedSelect(): This method is used to select the part of dataframe columns and return a copy 2 min read How to Add Multiple Columns in PySpark Dataframes ? In this article, we will see different ways of adding Multiple Columns in PySpark Dataframes. Let's create a sample dataframe for demonstration: Dataset Used: Cricket_data_set_odi Python3 # import pandas to read json file import pandas as pd # importing module import pyspark # importing sparksessio 2 min read How to rename multiple columns in PySpark dataframe ? In this article, we are going to see how to rename multiple columns in PySpark Dataframe. Before starting let's create a dataframe using pyspark: Python3 # importing module import pyspark from pyspark.sql.functions import col # importing sparksession from pyspark.sql module from pyspark.sql import S 2 min read PySpark - Order by multiple columns In this article, we are going to see how to orderby multiple columns in  PySpark DataFrames through Python. Create the dataframe for demonstration:Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession a 4 min read How to get name of dataframe column in PySpark ? In this article, we will discuss how to get the name of the Dataframe column in PySpark. To get the name of the columns present in the Dataframe we are using the columns function through this function we will get the list of all the column names present in the Dataframe. Syntax: df.columns We can a 3 min read How to change dataframe column names in PySpark ? In this article, we are going to see how to change the column names in the pyspark data frame. Let's create a Dataframe for demonstration: Python3 # Importing necessary libraries from pyspark.sql import SparkSession # Create a spark session spark = SparkSession.builder.appName('pyspark - example jo 3 min read How to delete columns in PySpark dataframe ? In this article, we are going to delete columns in Pyspark dataframe. To do this we will be using the drop() function. This function can be used to remove values from the dataframe. Syntax: dataframe.drop('column name') Python code to create student dataframe with three columns: Python3 # importing 2 min read How to name aggregate columns in PySpark DataFrame ? In this article, we are going to see how to name aggregate columns in the Pyspark dataframe. We can do this by using alias after groupBy(). groupBy() is used to join two columns and it is used to aggregate the columns, alias is used to change the name of the new column which is formed by grouping da 2 min read Applying function to PySpark Dataframe Column In this article, we're going to learn 'How we can apply a function to a PySpark DataFrame Column'. Apache Spark can be used in Python using PySpark Library. PySpark is an open-source Python library usually used for data analytics and data science. Pandas is powerful for data analysis but what makes 4 min read Like