0% found this document useful (0 votes)

94 views

Complete Guide To Spark Memory Management 1726709042

Uploaded by

olegruchinsky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

94 views

Complete Guide To Spark Memory Management 1726709042

Uploaded by

olegruchinsky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

🚀Complete Guide to Apache Spark

Memory Management🚀
Apache Spark is widely known for its powerful in-memory computations, which can
significantly speed up big data processing. However, to truly harness this power,
understanding how Spark manages memory is crucial. Efficient memory management is the
key to optimizing Spark jobs and avoiding costly out-of-memory (OOM) errors or
performance bottlenecks.

This guide provides a comprehensive breakdown of Spark memory management, focusing

on practical examples to help you manage memory effectively, whether you're a beginner or
an experienced user.

🔥1. Overview of Spark Memory Management 🔥

When a Spark job is executed, two main components handle memory
management:

● Driver: Manages task scheduling and coordination.

● Executors: Run tasks on worker nodes and manage the data
processing.

Each executor has a set amount of memory allocated, typically configured

using the spark.executor.memory setting. This memory is divided into
different sections, each serving a distinct role.

Each executor in Spark has a fixed amount of memory allocated to it, which is
managed by the JVM (Java Virtual Machine). Spark divides this memory into
different regions, mainly:

● Execution Memory: Used for tasks like shuffling, sorting, and

aggregating data.
● Storage Memory: Used for caching and persisting data (like
DataFrames or RDDs).

These two memory areas share the same pool, which is dynamically allocated
based on the workload. Let's break this down further.
🔥2. Executor Memory Breakdown 🔥
The memory allocated to each executor is split into several regions, each
responsible for different tasks. For a 4GB executor, here’s how the memory would
be divided:

💡2.1 Reserved Memory💡

Reserved memory is a small portion of memory (typically 300MB) that Spark uses
for internal operations like tracking metadata or task execution information. It is
non-configurable and ensures that essential system tasks have enough space to
operate.

Example of Metadata/Bookkeeping:

● Task Metadata: Keeps track of each task's progress, including its start/end
times, input/output data, and shuffle metrics.
● Partition Metadata: Maintains information about how data is split across
partitions on different nodes.

These small but important pieces of metadata ensure that Spark can efficiently
coordinate tasks across the cluster. Reserved memory acts as a safety buffer,
preventing the entire system from crashing due to insufficient memory.

💡2.2 User Memory💡

After the reserved memory is accounted for, the remaining memory is divided
between Spark Memory and User Memory.

User Memory is the part of memory available for user-defined objects, data
structures, and transformations. Spark does not directly manage this memory,
leaving it to the user. You might use user memory when defining custom
aggregations, UDFs (User-Defined Functions), or creating your own data structures
in transformations.
🧠Formula for User Memory Calculation🧠:
User Memory is calculated as the memory leftover after Spark allocates memory for Spark
Memory and Reserved Memory. Here’s the correct formula:

User Memory = (Total Memory - Reserved Memory) * (1 -

spark.memory.fraction)

Example Calculation (4GB Executor):

Let’s calculate User Memory for a 4GB executor:

● Total Memory = 4096MB

● Reserved Memory = 300MB
● spark.memory.fraction = 0.6

Using the formula:

● User Memory = (4096MB - 300MB) * (1 - 0.6)

● User Memory = 3796MB * 0.4 = 1518.4MB (approx 1.5GB)

Examples of User Memory Usage:

User-Defined Functions (UDFs): UDFs can consume significant memory,

especially when performing complex operations:

from pyspark.sql.functions import udf

def multiply(x):

return x * 2

multiply_udf = udf(multiply)

df.withColumn("new_col", multiply_udf(df["existing_col"])).show()

The intermediate data and function outputs consume User Memory, which is why it’s
important to be cautious when using UDFs in large-scale job
Custom Aggregation Logic: If you write your own logic using
mapPartitions() for aggregation, you might need to maintain custom objects
(like hash maps) to keep track of intermediate results.

def custom_aggregate(iterator):

result = {}

for record in iterator:

key = record[0]

value = record[1]

if key not in result:

result[key] = value

else:

result[key] += value

return iter(result.items())

In this example, the result dictionary stores intermediate results during the
aggregation. The memory used by this custom object (a Python dictionary) comes
from User Memory.
💡2.3 Spark Memory💡:
Spark Memory is the core part of executor memory, split into two sections:
Execution Memory and Storage Memory. These sections handle Spark's internal
operations and can dynamically borrow from each other based on the task's needs.

🌟Execution Memory:🌟
This part is responsible for storing intermediate data during tasks like shuffles,
joins, and aggregations. For example, if Spark is performing a sort or join, it uses
execution memory to hold temporary buffers for sorting data before writing it to disk
or network.

🌟Storage Memory:🌟
This section is used to cache data for reuse. When you call cache() or persist()
on a DataFrame, Spark stores the cached data in Storage Memory for faster
retrieval during subsequent actions.

Key Point: Spark uses a unified memory model, meaning execution and storage
memory share the same pool. If one memory section needs more space, it can
borrow from the other as long as there’s available memory.

Spark Memory = (Total Memory - Reserved Memory) *

spark.memory.fraction
💡3. Unified Memory Model (Execution vs. Storage
Memory)💡
In Spark’s Unified Memory Model, execution memory and storage memory share
the same pool. This allows Spark to dynamically allocate memory to either execution
or storage based on the needs of the current task. This dynamic nature is a key
feature of Spark's memory management, as it provides flexibility in handling different
workloads.

🧠3.1 Execution Memory🧠

This is where the majority of computations in Spark actually occur. When we talk
about Execution Memory, we're referring to the memory space where Spark
performs its data processing operations. This includes:

● Shuffles (redistributing data across partitions)

● Joins
● Sorting
● Aggregations
● Any other data transformations or actions

🧠→Formula for Execution Memory:🧠

The amount of memory reserved for execution depends on the Usable Memory and
the configuration of spark.memory.storageFraction:

● Execution Memory = Usable Memory * (1 -

spark.memory.storageFraction)
● Usable Memory: This is the memory left after Reserved Memory and User
Memory have been accounted for.
● spark.memory.storageFraction: Defines the percentage of the usable
memory dedicated to Storage Memory.

Example Calculation (4GB Executor):

With a 4GB executor and default settings (spark.memory.storageFraction =

0.5):

● Usable Memory = (4096MB - 300MB reserved) = 3796MB

● Execution Memory = 3796MB * (1 - 0.5) = 1.89GB
This means that approximately 1.89GB of memory will be available for execution
tasks such as shuffles and aggregations.

🧠3.2 Storage Memory🧠

Storage memory is used for caching or persisting data that needs to be reused
across multiple stages of a job. For example, when you call .cache() or
.persist() on an RDD or DataFrame, Spark keeps the cached data in Storage
Memory. If the data exceeds the available memory, Spark starts spilling the cached
data to disk to free up memory for other tasks.

→Formula for Storage Memory:

Storage memory is calculated based on the Usable Memory and the configuration
of spark.memory.storageFraction:

● Storage Memory = Usable Memory *

spark.memory.storageFraction

● Usable Memory: This is the same amount of memory used by execution

memory.
● spark.memory.storageFraction: Defines the percentage of the usable
memory allocated to Storage Memory.

Example Calculation (4GB Executor):

Using the same 4GB executor with default settings

(spark.memory.storageFraction = 0.5):

● Usable Memory = 3796MB

● Storage Memory = 3796MB * 0.5 = 1.89GB

Thus, approximately 1.89GB of memory will be available for storage, i.e., caching
and persisting DataFrames or RDDs.

🧠3.3 Dynamic Memory Allocation (Unified Model)🧠

The key feature of Spark's Unified Memory Model is the dynamic allocation of
memory between execution memory and storage memory. If one of the memory
pools (execution or storage) is running low on available memory, it can borrow from
the other pool, provided the other pool is not using it for active tasks.

Dynamic Sharing Example:

Let’s consider a scenario where you're caching several large DataFrames. As the job
progresses:

● Storage Memory is full: Spark may start spilling cached data to disk if it runs
out of storage memory. However, if Execution Memory is underutilized (e.g.,
there are no ongoing shuffle operations), Spark can borrow memory from the
execution pool to hold more cached data.

Conversely:

● If Execution Memory needs more space for intermediate operations (like

shuffles or joins), and Storage Memory has free space (e.g., no additional
data is being cached), Spark can borrow storage memory for the ongoing
execution task.

This dynamic sharing between the two memory pools is referred to as the dynamic
occupancy mechanism.

🔥Unified Memory Model: Dynamic Sharing🔥

The Unified Memory Model in Spark allows for flexible sharing of memory
between Execution and Storage based on workload requirements. Here's a
breakdown of how this works in real time:

1. Execution Memory borrows from Storage:

○ If a task involving shuffles or aggregations needs more memory for
intermediate results, Spark can dynamically reduce the size of the
Storage Memory pool and allocate more memory to Execution
Memory.
2. Storage Memory borrows from Execution:
○ If a job requires caching large datasets and there is idle Execution
Memory (i.e., no active shuffles or aggregations), Spark can borrow
that memory for caching without impacting ongoing computations.

Dynamic Sharing Example in Practice:

Let’s assume you're running a job that needs to cache several large DataFrames. If
the Storage Memory is fully utilized and Spark starts spilling cached data to disk, it
can borrow from the Execution Memory pool to prevent performance degradation
(assuming there are no active shuffle operations). This reduces the frequency of disk
IO operations, enhancing the performance of the job.

Similarly, if a job requires significant Execution Memory for large shuffles or joins,
and the Storage Memory isn’t fully utilized (e.g., minimal caching), Spark will borrow
from the Storage Memory pool to complete the task without spilling intermediate
data to disk.

This dynamic allocation of memory resources ensures that Spark can flexibly
handle varying workloads without hard boundaries between memory pools, resulting
in better performance and fewer memory-related issues.

💡4. Key Spark Memory Configurations💡

Here are the essential configurations you need to tune Spark’s memory
management effectively:

4.1 spark.executor.memory

● What it does: Sets the total amount of memory allocated to each executor.
● Example: spark.executor.memory=8g will allocate 8GB to each
executor.

4.2 spark.memory.fraction

● What it does: Specifies the fraction of executor memory used by Spark’s

execution and storage tasks.
● Default: 0.6, meaning 60% of the total memory is reserved for Spark tasks,
while the rest is for user-defined operations and overhead.

4.3 spark.memory.storageFraction

● What it does: Sets the fraction of Spark memory allocated to storage tasks
(like caching). The remainder goes to execution memory.
● Default: 0.5, meaning storage and execution memory are evenly split.

4.4 spark.executor.memoryOverhead

● What it does: Allocates additional memory for overhead tasks (e.g., Python
processes in PySpark). This prevents executors from running out of memory
due to non-JVM tasks.
● Example: Increasing spark.executor.memoryOverhead is useful when
using PySpark or other non-JVM languages.

💡5. Real-World Example: 4GB Executor Setup💡

Let’s break down memory allocation for an executor with 4GB of total memory using
the default configurations:

Total Executor Memory: 4GB

● Reserved Memory: 300MB

● Usable Memory: 4GB - 300MB = 3.7GB

User Memory:

● Formula: (Total Memory - Reserved Memory) * (1 -

spark.memory.fraction)
● Calculation: (4096MB - 300MB) * (1 - 0.6) = 1.49GB

Spark Memory (Execution + Storage):

● Formula: (Total Memory - Reserved Memory) *

spark.memory.fraction
● Calculation: (4096MB - 300MB) * 0.6 = 2.37GB

Execution Memory:

● Formula: Usable Memory * (1 - spark.memory.storageFraction)

● Calculation: 2.37GB * (1 - 0.5) = 1.19GB

Storage Memory:

● Formula: Usable Memory * spark.memory.storageFraction

● Calculation: 2.37GB * 0.5 = 1.19GB
💡6. Troubleshooting Common Memory Issues💡
Problem 1: OutOfMemory Errors

● Cause: Executor memory is insufficient for large datasets or shuffles.

● Solution: Increase spark.executor.memory or optimize the data pipeline
(e.g., by using MEMORY_AND_DISK persistence).

Problem 2: Garbage Collection Delays

● Cause: Too much time is spent in garbage collection.

● Solution: Tune the JVM GC options or enable off-heap memory
(spark.memory.offHeap.enabled).

Problem 3: Slow Caching or Frequent Recalculation

● Cause: Storage memory is too small to hold cached data.

● Solution: Increase spark.memory.storageFraction or use a
persistence mode that allows spilling to disk (e.g., MEMORY_AND_DISK).

7. Conclusion
Spark memory management can seem complex, but by understanding how memory
is divided and managed between tasks, storage, and custom operations, you can
optimize your jobs for performance and stability. The key is to balance execution
and storage memory, carefully manage user memory (especially with UDFs and
custom objects), and adjust memory configurations to fit your workloads.

With practical examples and detailed breakdowns, you now have the tools to handle
any scenario involving Spark memory management.

變速箱 A245E,A246E ZZE121
70% (10)
變速箱 A245E,A246E ZZE121
136 pages
Skills Framework HUL
100% (1)
Skills Framework HUL
29 pages
Rationale All About Me
100% (2)
Rationale All About Me
50 pages
Spark A To Z
No ratings yet
Spark A To Z
63 pages
Machine Learning With Spark
No ratings yet
Machine Learning With Spark
26 pages
Databricks Question
No ratings yet
Databricks Question
7 pages
Spark Optimizations & Deployment
No ratings yet
Spark Optimizations & Deployment
39 pages
Pyspark Cashing & Persisting - Complete Guide
No ratings yet
Pyspark Cashing & Persisting - Complete Guide
3 pages
Apache Spark
No ratings yet
Apache Spark
62 pages
5 - Programming With RDDs and Dataframes
No ratings yet
5 - Programming With RDDs and Dataframes
32 pages
Pyspark RDD Cheat Sheet Python For Data Science
No ratings yet
Pyspark RDD Cheat Sheet Python For Data Science
1 page
Explain in Detail About Hadoop Framework
No ratings yet
Explain in Detail About Hadoop Framework
4 pages
Spark Syllabus 1
No ratings yet
Spark Syllabus 1
3 pages
3 Lecture 3-ETL
100% (1)
3 Lecture 3-ETL
42 pages
Learning Apache Spark With Python
No ratings yet
Learning Apache Spark With Python
10 pages
How To Create Secrets in Databricks? - by Ashish Garg - Medium
No ratings yet
How To Create Secrets in Databricks? - by Ashish Garg - Medium
13 pages
Download ebooks file Learn PySpark: Build python-based machine learning and deep learning models 1st Edition Pramod Singh all chapters
100% (3)
Download ebooks file Learn PySpark: Build python-based machine learning and deep learning models 1st Edition Pramod Singh all chapters
55 pages
50 PySpark Interview Questions.pdf
No ratings yet
50 PySpark Interview Questions.pdf
7 pages
Data Bricks
No ratings yet
Data Bricks
43 pages
SCD Type-1,2 Implementation in Pyspark
No ratings yet
SCD Type-1,2 Implementation in Pyspark
6 pages
O Reilly Data Lake Bootcamp Day 11694182865124
No ratings yet
O Reilly Data Lake Bootcamp Day 11694182865124
46 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
Dec 01 2020
No ratings yet
Dec 01 2020
298 pages
2018 02 08 Whats New in Apache Spark 2 180213220045
No ratings yet
2018 02 08 Whats New in Apache Spark 2 180213220045
57 pages
Lecture 4 - Pair RDD and DataFrame
No ratings yet
Lecture 4 - Pair RDD and DataFrame
38 pages
Data Cleaning With PySpark
No ratings yet
Data Cleaning With PySpark
21 pages
Py Spark
No ratings yet
Py Spark
427 pages
Apache Spark Architecture
No ratings yet
Apache Spark Architecture
7 pages
_ Databricks & PySpark learning day-10
No ratings yet
_ Databricks & PySpark learning day-10
4 pages
SCD Type 2. Pyspark
No ratings yet
SCD Type 2. Pyspark
7 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
9 pages
Interview Questions
No ratings yet
Interview Questions
2 pages
Pyspark
No ratings yet
Pyspark
31 pages
Scala PDF
No ratings yet
Scala PDF
29 pages
Spark Interview Questions IV. Next Installment of The Series. - by Amit Singh Rathore - Dev Genius
No ratings yet
Spark Interview Questions IV. Next Installment of The Series. - by Amit Singh Rathore - Dev Genius
15 pages
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
DVS SPARK Course Content PDF
No ratings yet
DVS SPARK Course Content PDF
2 pages
Data Engineering & GCP Basic Services 2. Data Storage in GCP 3. Database Offering by GCP 4. Data Processing in GCP 5. ML/AI Offering in GCP
No ratings yet
Data Engineering & GCP Basic Services 2. Data Storage in GCP 3. Database Offering by GCP 4. Data Processing in GCP 5. ML/AI Offering in GCP
3 pages
Hive Cheat Sheet - Quick Reference
No ratings yet
Hive Cheat Sheet - Quick Reference
19 pages
Spark Interview Questions 1713805760
No ratings yet
Spark Interview Questions 1713805760
40 pages
PySpark Questions
No ratings yet
PySpark Questions
5 pages
1 Introduction To Databricks Machine Learning
No ratings yet
1 Introduction To Databricks Machine Learning
9 pages
Big Data Tools 2 - Apache Spark With PySpark
No ratings yet
Big Data Tools 2 - Apache Spark With PySpark
33 pages
Apache Druid: Sudhindra Tirupati Nagaraj
No ratings yet
Apache Druid: Sudhindra Tirupati Nagaraj
12 pages
Spark Streaming Twitter Example
No ratings yet
Spark Streaming Twitter Example
4 pages
Spark Summit East 2015 - Adv Dev Ops - Student Slides
No ratings yet
Spark Summit East 2015 - Adv Dev Ops - Student Slides
219 pages
Databricks Associate Data Engineer Notes
No ratings yet
Databricks Associate Data Engineer Notes
39 pages
Performance Tuning Spark UI
No ratings yet
Performance Tuning Spark UI
37 pages
Spark Interview QUestions
No ratings yet
Spark Interview QUestions
200 pages
Flink Vs Spark by Slim Baltagi
No ratings yet
Flink Vs Spark by Slim Baltagi
67 pages
Unit 5
100% (1)
Unit 5
109 pages
Window Function in Pyspark
100% (1)
Window Function in Pyspark
8 pages
30 Pyspark Coding Questions
No ratings yet
30 Pyspark Coding Questions
9 pages
Apache Sqoop
No ratings yet
Apache Sqoop
21 pages
Spark Notes
No ratings yet
Spark Notes
6 pages
Day 4-01-Spark
No ratings yet
Day 4-01-Spark
43 pages
Snowflake Demo
No ratings yet
Snowflake Demo
13 pages
Ambari Operations
No ratings yet
Ambari Operations
194 pages
Databricksmcqsquestionsandanswers
No ratings yet
Databricksmcqsquestionsandanswers
5 pages
Spark SQL
100% (1)
Spark SQL
25 pages
HBase Administration Cookbook
From Everand
HBase Administration Cookbook
Yifeng Jiang
No ratings yet
ORACLE 12C Complete Self-Assessment Guide
From Everand
ORACLE 12C Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
Backup Veritas
No ratings yet
Backup Veritas
48 pages
Rdm630 Spec.
No ratings yet
Rdm630 Spec.
41 pages
[P] YUNPENG 2019 - Towards an integrated process model for new product development
No ratings yet
[P] YUNPENG 2019 - Towards an integrated process model for new product development
19 pages
LESSON-1 Statistics
No ratings yet
LESSON-1 Statistics
7 pages
Project
No ratings yet
Project
53 pages
PR 1 Lesson 6
No ratings yet
PR 1 Lesson 6
15 pages
Oil and Gas Activity Operations Manual
No ratings yet
Oil and Gas Activity Operations Manual
208 pages
Database COC3
No ratings yet
Database COC3
4 pages
Instant ebooks textbook Practical Guide to Large Database Migration Preston Zhang download all chapters
100% (2)
Instant ebooks textbook Practical Guide to Large Database Migration Preston Zhang download all chapters
65 pages
HOLIDAYS HW IP XI Page 3
No ratings yet
HOLIDAYS HW IP XI Page 3
7 pages
WE Framework Dig
No ratings yet
WE Framework Dig
10 pages
Query
No ratings yet
Query
3 pages
Oracle 10G: Data Guard Broker
No ratings yet
Oracle 10G: Data Guard Broker
7 pages
The Efectiviness of Computerized Account
100% (1)
The Efectiviness of Computerized Account
86 pages
Dbms Notes
No ratings yet
Dbms Notes
7 pages
The Role of Tourism in The Development of Sierra Leone: Bisolu Sylvanus Hotchinson Betts
No ratings yet
The Role of Tourism in The Development of Sierra Leone: Bisolu Sylvanus Hotchinson Betts
20 pages
Introduction To Tableau
No ratings yet
Introduction To Tableau
39 pages
Shs Eapp Qtr.2 Module 16
No ratings yet
Shs Eapp Qtr.2 Module 16
18 pages
GIS Question Bank
No ratings yet
GIS Question Bank
4 pages
Assignment MAED 103 Educational Statistics MarieKris Agan
No ratings yet
Assignment MAED 103 Educational Statistics MarieKris Agan
3 pages
DBMS Practical List
No ratings yet
DBMS Practical List
6 pages
ADBTs Lecture#02
No ratings yet
ADBTs Lecture#02
29 pages
Entity Relationship Diagram (ERD)
100% (1)
Entity Relationship Diagram (ERD)
39 pages
11 Displaying A PNR Host
No ratings yet
11 Displaying A PNR Host
20 pages
Secondary Storage Devices: Magnetic Disks
No ratings yet
Secondary Storage Devices: Magnetic Disks
34 pages
Laboratory 2: Information Packets
No ratings yet
Laboratory 2: Information Packets
12 pages
IST HSC Notes
No ratings yet
IST HSC Notes
78 pages

Complete Guide To Spark Memory Management 1726709042

Uploaded by

Complete Guide To Spark Memory Management 1726709042

Uploaded by

🚀Complete Guide to Apache Spark

This guide provides a comprehensive breakdown of Spark memory management, focusing

🔥1. Overview of Spark Memory Management 🔥

● Driver: Manages task scheduling and coordination.

Each executor has a set amount of memory allocated, typically configured

● Execution Memory: Used for tasks like shuffling, sorting, and

💡2.1 Reserved Memory💡

💡2.2 User Memory💡

User Memory = (Total Memory - Reserved Memory) * (1 -

Example Calculation (4GB Executor):

Let’s calculate User Memory for a 4GB executor:

● Total Memory = 4096MB

Using the formula:

● User Memory = (4096MB - 300MB) * (1 - 0.6)

Examples of User Memory Usage:

User-Defined Functions (UDFs): UDFs can consume significant memory,

from pyspark.sql.functions import udf

for record in iterator:

if key not in result:

Spark Memory = (Total Memory - Reserved Memory) *

🧠3.1 Execution Memory🧠

● Shuffles (redistributing data across partitions)

🧠→Formula for Execution Memory:🧠

● Execution Memory = Usable Memory * (1 -

Example Calculation (4GB Executor):

With a 4GB executor and default settings (spark.memory.storageFraction =

● Usable Memory = (4096MB - 300MB reserved) = 3796MB

🧠3.2 Storage Memory🧠

→Formula for Storage Memory:

● Storage Memory = Usable Memory *

● Usable Memory: This is the same amount of memory used by execution

Example Calculation (4GB Executor):

Using the same 4GB executor with default settings

● Usable Memory = 3796MB

🧠3.3 Dynamic Memory Allocation (Unified Model)🧠

Dynamic Sharing Example:

● If Execution Memory needs more space for intermediate operations (like

🔥Unified Memory Model: Dynamic Sharing🔥

1. Execution Memory borrows from Storage:

Dynamic Sharing Example in Practice:

💡4. Key Spark Memory Configurations💡

● What it does: Specifies the fraction of executor memory used by Spark’s

💡5. Real-World Example: 4GB Executor Setup💡

Total Executor Memory: 4GB

● Reserved Memory: 300MB

● Formula: (Total Memory - Reserved Memory) * (1 -

Spark Memory (Execution + Storage):

● Formula: (Total Memory - Reserved Memory) *

● Formula: Usable Memory * (1 - spark.memory.storageFraction)

● Formula: Usable Memory * spark.memory.storageFraction

● Cause: Executor memory is insufficient for large datasets or shuffles.

Problem 2: Garbage Collection Delays

● Cause: Too much time is spent in garbage collection.

Problem 3: Slow Caching or Frequent Recalculation

● Cause: Storage memory is too small to hold cached data.

You might also like