Skip to content

Commit f7c7992

Browse files
thvasilosrowen
authored andcommitted
[EC2] [SPARK-6188] Instance types can be mislabeled when re-starting cluster with default arguments
As described in https://issues.apache.org/jira/browse/SPARK-6188 and discovered in https://issues.apache.org/jira/browse/SPARK-5838. When re-starting a cluster, if the user does not provide the instance types, which is the recommended behavior in the docs currently, the instance will be assigned the default type m1.large. This then affects the setup of the machines. This solves this by getting the instance types from the existing instances, and overwriting the default options. EDIT: Further clarification of the issue: In short, while the instances themselves are the same as launched, their setup is done assuming the default instance type, m1.large. This means that the machines are assumed to have 2 disks, and that leads to problems that are described in in issue [5838](https://issues.apache.org/jira/browse/SPARK-5838), where machines that have one disk end up having shuffle spills in the in the small (8GB) snapshot partitions that quickly fills up and results in failing jobs due to "No space left on device" errors. Other instance specific settings that are set in the spark_ec2.py script are likely to be wrong as well. Author: Theodore Vasiloudis <[email protected]> Author: Theodore Vasiloudis <[email protected]> Closes apache#4916 from thvasilo/SPARK-6188]-Instance-types-can-be-mislabeled-when-re-starting-cluster-with-default-arguments and squashes the following commits: 6705b98 [Theodore Vasiloudis] Added comment to clarify setting master instance type to the empty string. a3d29fe [Theodore Vasiloudis] More trailing whitespace 7b32429 [Theodore Vasiloudis] Removed trailing whitespace 3ebd52a [Theodore Vasiloudis] Make sure that the instance type is correct when relaunching a cluster.
1 parent 55b1b32 commit f7c7992

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

ec2/spark_ec2.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1307,6 +1307,17 @@ def real_main():
13071307
cluster_instances=(master_nodes + slave_nodes),
13081308
cluster_state='ssh-ready'
13091309
)
1310+
1311+
# Determine types of running instances
1312+
existing_master_type = master_nodes[0].instance_type
1313+
existing_slave_type = slave_nodes[0].instance_type
1314+
# Setting opts.master_instance_type to the empty string indicates we
1315+
# have the same instance type for the master and the slaves
1316+
if existing_master_type == existing_slave_type:
1317+
existing_master_type = ""
1318+
opts.master_instance_type = existing_master_type
1319+
opts.instance_type = existing_slave_type
1320+
13101321
setup_cluster(conn, master_nodes, slave_nodes, opts, False)
13111322

13121323
else:

0 commit comments

Comments
 (0)