[SPARK-6079] Use index to speed up StatusTracker.getJobIdsForGroup() #4830

JoshRosen · 2015-02-28T07:40:05Z

StatusTracker.getJobIdsForGroup() is implemented via a linear scan over a HashMap rather than using an index, which might be an expensive operation if there are many (e.g. thousands) of retained jobs.

This patch adds a new map to JobProgressListener in order to speed up these lookups.

SparkQA · 2015-02-28T07:42:28Z

Test build #28122 has started for PR 4830 at commit 2c49614.

This patch merges cleanly.

SparkQA · 2015-02-28T08:41:10Z

Test build #28122 has finished for PR 4830 at commit 2c49614.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-28T08:41:14Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28122/
Test FAILed.

JoshRosen · 2015-02-28T10:08:32Z

Jenkins, retest this please.

SparkQA · 2015-02-28T10:13:01Z

Test build #28125 has started for PR 4830 at commit 2c49614.

This patch merges cleanly.

SparkQA · 2015-02-28T11:32:57Z

Test build #28125 has finished for PR 4830 at commit 2c49614.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-02-28T11:33:01Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28125/
Test PASSed.

JoshRosen · 2015-03-09T23:21:28Z

Jenkins, retest this please.

JoshRosen · 2015-03-09T23:21:50Z

@pwendell @andrewor14, could one of you take a quick look at this patch? Should be pretty straightforward.

SparkQA · 2015-03-09T23:22:41Z

Test build #28412 has started for PR 4830 at commit 2c49614.

This patch merges cleanly.

SparkQA · 2015-03-10T00:50:25Z

Test build #28412 has finished for PR 4830 at commit 2c49614.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-03-10T00:50:29Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28412/
Test PASSed.

sryza · 2015-03-11T22:36:43Z

core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala

@@ -109,7 +111,8 @@ class JobProgressListener(conf: SparkConf) extends SparkListener with Logging {
      "failedJobs" -> failedJobs.size,
      "completedStages" -> completedStages.size,
      "skippedStages" -> skippedStages.size,
-      "failedStages" -> failedStages.size
+      "failedStages" -> failedStages.size,
+      "jobGroupToJobIds" -> jobGroupToJobIds.values.map(_.size).sum


IIUC, based on when elements are removed, the size of this should be the same as the size of jobIdToData. Why not place jobGroupToJobIds alongside it in getSizesOfSoftSizeLimitedCollections?

Good point; I'll make this change.

andrewor14 · 2015-03-24T21:50:42Z

LGTM for the most part. I think this is mergeable as is but I would like to see the for loop written in a more readable way.

…roup-indexing

SparkQA · 2015-03-25T21:53:16Z

Test build #29185 has started for PR 4830 at commit e39c5c7.

This patch merges cleanly.

SparkQA · 2015-03-25T23:11:08Z

Test build #29185 has finished for PR 4830 at commit e39c5c7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2015-03-25T23:11:12Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29185/
Test PASSed.

andrewor14 · 2015-03-26T00:39:23Z

Thanks for the new comments @JoshRosen. The latest version is really easy to follow. Merging into master.

JoshRosen · 2015-04-02T20:13:07Z

Also backported this to branch-1.3 (1.3.1).

`StatusTracker.getJobIdsForGroup()` is implemented via a linear scan over a HashMap rather than using an index, which might be an expensive operation if there are many (e.g. thousands) of retained jobs. This patch adds a new map to `JobProgressListener` in order to speed up these lookups. Author: Josh Rosen <[email protected]> Closes #4830 from JoshRosen/statustracker-job-group-indexing and squashes the following commits: e39c5c7 [Josh Rosen] Address review feedback 6709fb2 [Josh Rosen] Merge remote-tracking branch 'origin/master' into statustracker-job-group-indexing 2c49614 [Josh Rosen] getOrElse 97275a7 [Josh Rosen] Add jobGroup to jobId index to JobProgressListener (cherry picked from commit d44a336) Signed-off-by: Josh Rosen <[email protected]>

JoshRosen added 2 commits February 27, 2015 23:29

Add jobGroup to jobId index to JobProgressListener

97275a7

getOrElse

2c49614

JoshRosen changed the title ~~[SPARK-6079 ] Use index to speed up StatusTracker.getJobIdsForGroup()~~ [SPARK-6079] Use index to speed up StatusTracker.getJobIdsForGroup() Feb 28, 2015

sryza reviewed Mar 11, 2015
View reviewed changes

JoshRosen added 2 commits March 25, 2015 11:11

Merge remote-tracking branch 'origin/master' into statustracker-job-g…

6709fb2

…roup-indexing

Address review feedback

e39c5c7

asfgit closed this in d44a336 Mar 26, 2015

JoshRosen deleted the statustracker-job-group-indexing branch April 2, 2015 20:13

[SPARK-6079] Use index to speed up StatusTracker.getJobIdsForGroup() #4830

[SPARK-6079] Use index to speed up StatusTracker.getJobIdsForGroup() #4830

Uh oh!

Conversation

JoshRosen commented Feb 28, 2015

Uh oh!

SparkQA commented Feb 28, 2015

Uh oh!

SparkQA commented Feb 28, 2015

Uh oh!

AmplabJenkins commented Feb 28, 2015

Uh oh!

JoshRosen commented Feb 28, 2015

Uh oh!

SparkQA commented Feb 28, 2015

Uh oh!

SparkQA commented Feb 28, 2015

Uh oh!

AmplabJenkins commented Feb 28, 2015

Uh oh!

JoshRosen commented Mar 9, 2015

Uh oh!

JoshRosen commented Mar 9, 2015

Uh oh!

SparkQA commented Mar 9, 2015

Uh oh!

SparkQA commented Mar 10, 2015

Uh oh!

AmplabJenkins commented Mar 10, 2015

Uh oh!

sryza Mar 11, 2015

Choose a reason for hiding this comment

Uh oh!

JoshRosen Mar 24, 2015

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Mar 24, 2015

Uh oh!

SparkQA commented Mar 25, 2015

Uh oh!

SparkQA commented Mar 25, 2015

Uh oh!

AmplabJenkins commented Mar 25, 2015

Uh oh!

andrewor14 commented Mar 26, 2015

Uh oh!

JoshRosen commented Apr 2, 2015

Uh oh!

Uh oh!