Page 02

Uploaded by

leebha.pushparaj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Page 02

Uploaded by

leebha.pushparaj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

import findspark

from pyspark.sql import SparkSession

from pyspark.sql.functions import avg
from pyspark.sql.functions import *

Initialising a SparkSession findspark.init()

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("App01").getOrCreate()
Reading data from a file df = spark.read.format('csv').option('header', True).load('fpath.csv')
Loading/Creating DataFrame from CSV df = spark.read.csv(fpath, header=True)
Loading/Creating DataFrame from JSON df = spark.read.json(fpath)
Loading/Creating DataFrame from Parquet df = spark.read.parquet(fpath)
Loading/Creating DataFrame from RDDs df = rdd1.toDF(["ID", "Name"])
Creating DataFrame from data data = [(1, "Alice", 28), (2, "Bob", 22)]
columns = ["ID", "Name", "Age"]
df = spark.createDataFrame(data, columns)
Get Dataframe data count df.count()
Get Dataframe columns df.columns
Get Dataframe Schema df.printSchema()
Displaying Data df.show()
Filtering data by Columns df1 = df.select("Name", "Age")
Filtering data by Elements using filter() df1 = df.filter(df.Age > 18)
Filtering data by Elements using expr() df1 = df.filter(expr("Age > 18"))
Filtering data by Elements using where() df1 = df.where(df.Age > 18)
Filtering data by Elements using SQL-like syntax df1 = df.where("Age > 18")
Aggregating data from pyspark.sql.functions import avg
df_aggregated = df.groupBy('colname1').agg(avg('colname2'))
Joining data df_joined = df1.join(df2, df.column_name == df2.column_name)
df_joined = df1.join(df2, join_condition, "full")
df_joined = df1.join(df2, join_condition, "left")
df_joined = df1.join(df2, join_condition, "right")
df_joined = df1.join(df2, join_condition, "inner")
Column Adding dfnew = df.withcolumn("City", lit("Bangalore"))
Column Renaming dfnew = df.withcolumnRenamed("Name", "Full_Name")
Column Dropping dfnew = df.drop("Age")
Transform data using Functions df1 = df.withColumn("Status", when(col("Age")>25, "Adault").otherwise("Young"))
Transform data using Expressions df1 = df.withColumn("ID_Name", concat(col("ID", lit("_"), col("Name")))
Transform data using SQL-like expr df1 = df.withColumn("AgeGroup", expr("CASE WITH Age<=25 THEN 'Young
ELSE 'Adult' END")))
Writing data to a file df.write.format('csv').option('header', True).save('file_path.csv')

Aggregation using Sum, Avg, Count, Min, Max df1 = df.select(avg("Salary")).colect()[0][0]

Grouping data elements df_aggregated = df.groupBy('colname1').agg(avg('colname2'))
Handling Missing data
Dropping Rows with Null Values df.dropna()
Filling Null Values df.fillna({'Age': df.select(avg('Age').first()[0]})
Handline Categorical Missing Data df.fillna({'Gender':'Unknown'})
Adding an Indicator Column df.withColumn('Age_missing', df['age'].isNull())

Date and Time Stamp from pyspark.sql.functions import current_date, current_timestamp

Current Date df.withColumn("CurrentDate", current_date())
Current Time Stamp df.withColumn("CurrentTimestamp", current_timestamp())
Date Difference df.withColumn("DaysSince", datediff(current_date(), col("Date")))
Month Between df.withColumn("MonthBetween", months_between(current_date(), col("Date")))
Date Addition df.withColumn("Plus10Days", date_add(col("Date"), 10))
Date Subtraction df.withColumn("Minus5Days", date_sub(col("Date"), 5))

Etl Commands For Pyspark
No ratings yet
Etl Commands For Pyspark
8 pages
Pyspark Practice
No ratings yet
Pyspark Practice
42 pages
SQL Vs PySpark 1678871778
No ratings yet
SQL Vs PySpark 1678871778
8 pages
XML-Based Web Applications
No ratings yet
XML-Based Web Applications
114 pages
PySpark SQL Cheat Sheet Python
No ratings yet
PySpark SQL Cheat Sheet Python
1 page
Hashdd Phishtank
No ratings yet
Hashdd Phishtank
414 pages
SQL Cheat Sheet Python
No ratings yet
SQL Cheat Sheet Python
1 page
Pyspark SQL Basics Cheat Sheet: Python For Data Science
No ratings yet
Pyspark SQL Basics Cheat Sheet: Python For Data Science
1 page
(Big Data Analytics With PySpark) (CheatSheet)
No ratings yet
(Big Data Analytics With PySpark) (CheatSheet)
7 pages
Pyspark SQL and DataFrames
No ratings yet
Pyspark SQL and DataFrames
6 pages
PySpark SQL Cheat Sheet Python PDF
No ratings yet
PySpark SQL Cheat Sheet Python PDF
1 page
PySpark SQL Cheat Sheet Python PDF
No ratings yet
PySpark SQL Cheat Sheet Python PDF
1 page
PySpark SQL Cheat Sheet Python
100% (2)
PySpark SQL Cheat Sheet Python
1 page
Top 100 Pyspark Functions for Data Engineers 1738131847
No ratings yet
Top 100 Pyspark Functions for Data Engineers 1738131847
30 pages
EDA Python for Data Analsis
No ratings yet
EDA Python for Data Analsis
10 pages
Pyspark Code
No ratings yet
Pyspark Code
3 pages
PySpark Interview Cheatsheet 1741068112
No ratings yet
PySpark Interview Cheatsheet 1741068112
19 pages
journal
No ratings yet
journal
47 pages
Spark Commands
No ratings yet
Spark Commands
3 pages
22b2195_E10(1)
No ratings yet
22b2195_E10(1)
5 pages
Mod5 Bda
No ratings yet
Mod5 Bda
9 pages
Pyspark coding questions from StrataScratch platform
No ratings yet
Pyspark coding questions from StrataScratch platform
23 pages
PySpark Transformations
No ratings yet
PySpark Transformations
18 pages
Python Data Exploratory Commands
No ratings yet
Python Data Exploratory Commands
9 pages
Comparison of SQL
No ratings yet
Comparison of SQL
11 pages
Pyspark_Coding_Interview_Questions
No ratings yet
Pyspark_Coding_Interview_Questions
19 pages
Pyspark Basics
No ratings yet
Pyspark Basics
16 pages
quewtion sql_pyspark
No ratings yet
quewtion sql_pyspark
4 pages
CS 2018 042
No ratings yet
CS 2018 042
8 pages
Pyspark Funcamentals
No ratings yet
Pyspark Funcamentals
10 pages
Fundamental Pyspark Operations 1708364268
No ratings yet
Fundamental Pyspark Operations 1708364268
10 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
10 pages
Py Spark
No ratings yet
Py Spark
7 pages
Spark Test Que
No ratings yet
Spark Test Que
3 pages
Py Spark
No ratings yet
Py Spark
8 pages
Week12 Assignment Solution
No ratings yet
Week12 Assignment Solution
10 pages
Interview Prep
No ratings yet
Interview Prep
24 pages
SQL and PySpark
No ratings yet
SQL and PySpark
80 pages
Introducing ChatGPT
No ratings yet
Introducing ChatGPT
19 pages
Data Manipulation
No ratings yet
Data Manipulation
3 pages
Must Know Pyspark Coding Before Databricks Interview
No ratings yet
Must Know Pyspark Coding Before Databricks Interview
7 pages
SQL & pySPARK
No ratings yet
SQL & pySPARK
9 pages
SQL vs Pyspark-1
No ratings yet
SQL vs Pyspark-1
9 pages
BDA_All_37_Practical_Answers_
No ratings yet
BDA_All_37_Practical_Answers_
3 pages
SQL_ &_PYSPAK
No ratings yet
SQL_ &_PYSPAK
6 pages
Chapter 3
No ratings yet
Chapter 3
33 pages
Question Bank-BDA (Module 1&2) 2
No ratings yet
Question Bank-BDA (Module 1&2) 2
5 pages
Big Data Analytics in Apache Spark
No ratings yet
Big Data Analytics in Apache Spark
79 pages
Pyspark Interview Questions
No ratings yet
Pyspark Interview Questions
4 pages
Unit V sql
No ratings yet
Unit V sql
5 pages
Pair RDD Operations: Flat Map
No ratings yet
Pair RDD Operations: Flat Map
4 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
PySpark Interview Questions
No ratings yet
PySpark Interview Questions
3 pages
unit 4 Spark SQL
No ratings yet
unit 4 Spark SQL
49 pages
Cheat Sheet: From Spark Data Sources SQL Queries
No ratings yet
Cheat Sheet: From Spark Data Sources SQL Queries
1 page
Cheat Sheet: From Spark Data Sources SQL Queries
No ratings yet
Cheat Sheet: From Spark Data Sources SQL Queries
1 page
Analysis of Heart Disease Dataset
No ratings yet
Analysis of Heart Disease Dataset
16 pages
Py Spark Samples
No ratings yet
Py Spark Samples
3 pages
⚠️ TCS Rejected Many Due to Weak PySpark Logic!?
No ratings yet
⚠️ TCS Rejected Many Due to Weak PySpark Logic!?
7 pages
Databricks vs SQL Cheat Sheet
No ratings yet
Databricks vs SQL Cheat Sheet
11 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Java DOM Tutorial
No ratings yet
Java DOM Tutorial
51 pages
John Deere
100% (1)
John Deere
2 pages
Instant download Pro LINQ Language Integrated Query in C 2008 1st Edition Joseph C. Rattz pdf all chapter
100% (3)
Instant download Pro LINQ Language Integrated Query in C 2008 1st Edition Joseph C. Rattz pdf all chapter
61 pages
Customizing PDF Output
No ratings yet
Customizing PDF Output
23 pages
Wordlists
No ratings yet
Wordlists
5 pages
Key Press Canvas Application J2ME
No ratings yet
Key Press Canvas Application J2ME
4 pages
XML and Application Integration: Dr. Amor Lazzez
No ratings yet
XML and Application Integration: Dr. Amor Lazzez
9 pages
YAML Basics
No ratings yet
YAML Basics
8 pages
Login Time Source Ums
No ratings yet
Login Time Source Ums
9 pages
EV-TLab8 Annex
No ratings yet
EV-TLab8 Annex
11 pages
UI Developmet Interview Questions With Answers PDF
No ratings yet
UI Developmet Interview Questions With Answers PDF
14 pages
CRUD Application in Zend Framework
No ratings yet
CRUD Application in Zend Framework
9 pages
Examples of XML
No ratings yet
Examples of XML
6 pages
Programming XML
No ratings yet
Programming XML
65 pages
XPath Tutorial
No ratings yet
XPath Tutorial
11 pages
[Ebooks PDF] download (Ebook) Sams Teach Yourself XML in 21 Days (2nd Edition) by Devan Shepherd ISBN 9780672320934, 9780768657968, 0672320932, 0768657962 full chapters
100% (2)
[Ebooks PDF] download (Ebook) Sams Teach Yourself XML in 21 Days (2nd Edition) by Devan Shepherd ISBN 9780672320934, 9780768657968, 0672320932, 0768657962 full chapters
67 pages
Document Type Definition (DTD) : Well-Formed
No ratings yet
Document Type Definition (DTD) : Well-Formed
12 pages
XML Final Exam
No ratings yet
XML Final Exam
28 pages
Write A Program To Display A Text in Bold, Italic, Small & Big
No ratings yet
Write A Program To Display A Text in Bold, Italic, Small & Big
12 pages
EU Accepted File Formats for eCTD - November 2023
No ratings yet
EU Accepted File Formats for eCTD - November 2023
2 pages
XPath in Eclipse
100% (1)
XPath in Eclipse
4 pages
XML and Application Integration
No ratings yet
XML and Application Integration
3 pages
Monthly Production Report
No ratings yet
Monthly Production Report
3 pages
Expose A SOAP Web Service - OutSystems
No ratings yet
Expose A SOAP Web Service - OutSystems
2 pages
F
No ratings yet
F
8 pages
F
No ratings yet
F
22 pages
6136
No ratings yet
6136
4 pages
C# and XML Primer 1st Edition Jonathan Hartwell (Auth.) 2024 Scribd Download
100% (3)
C# and XML Primer 1st Edition Jonathan Hartwell (Auth.) 2024 Scribd Download
62 pages

Page 02

Uploaded by

Page 02

Uploaded by

import findspark

from pyspark.sql import SparkSession

Initialising a SparkSession findspark.init()

Aggregation using Sum, Avg, Count, Min, Max df1 = df.select(avg("Salary")).colect()[0][0]

Date and Time Stamp from pyspark.sql.functions import current_date, current_timestamp

You might also like