Skip to content

Shuffle Write和Shuffle Read阶段溢写的时机不同 #31

@MingRongXi

Description

@MingRongXi

利杰你好,我有个问题想请教你。我在看Spark源码时,发现Shuffle Write是先往Map里插入值,然后再判断是否需要溢写;而Shuffle Read是先判断是否需要溢写,然后再插入值。按照我个人理解,采用Shuffle Read的方式内存溢出的风险会更低,Shuffle Write可能会在扩容时导致溢出。你知道Spark为什么要这样设计吗
Shuffle Map
image
Shuffle Read
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions