Commit 0d7618a
[SPARK-42585][CONNECT] Streaming of local relations
### What changes were proposed in this pull request?
In the PR, I propose to transfer a local relation to the server in streaming way when it exceeds some size which is defined by the SQL config `spark.sql.session.localRelationCacheThreshold`. The config value is 64MB by default. In particular:
1. The client applies the `sha256` function over the arrow form of the local relation;
2. It checks presents of the relation at the server side by sending the relation hash to the server;
3. If the server doesn't have the local relation, the client transfers the local relation as an artefact with the name `cache/<sha256>`;
4. As soon as the relation has presented at the server already, or transferred recently, the client transform the logical plan by replacing the `LocalRelation` node by `CachedLocalRelation` with the hash.
5. On another hand, the server converts `CachedLocalRelation` back to `LocalRelation` by retrieving the relation body from the local cache.
#### Details of the implementation
The client sends new command `ArtifactStatusesRequest` to check either the local relation is cached at the server or not. New command comes via new RPC endpoint `ArtifactStatus`. And the server answers by new message `ArtifactStatusesResponse`, see **base.proto**.
The client transfers serialized (in avro) body of local relation and its schema via the RPC endpoint `AddArtifacts`. On another hand, the server stores the received artifact in the block manager using the id `CacheId`. The last one has 3 parts:
- `userId` - the identifier of the user that created the local relation,
- `sessionId` - the identifier of the session which the relation belongs to,
- `hash` - a `sha-256` hash over relation body.
See **SparkConnectArtifactManager.addArtifact()**.
The current query is blocked till the local relation is cached at the server side.
When the server receives the query, it retrieves `userId`, `sessionId` and `hash` from `CachedLocalRelation`, and gets the local relation data from the block manager. See **SparkConnectPlanner.transformCachedLocalRelation()**.
The occupied blocks at the block manager are removed when an user session is invalidated in `userSessionMapping`. See **SparkConnectService.RemoveSessionListener** and **BlockManager.removeCache()`**.
### Why are the changes needed?
To allow creating a dataframe from a large local collection. `spark.createDataFrame(...)` fails with the following error w/o the changes:
```java
23/04/21 20:32:20 WARN NettyServerStream: Exception processing message
org.sparkproject.connect.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: gRPC message exceeds maximum size 134217728: 268435456
at org.sparkproject.connect.grpc.Status.asRuntimeException(Status.java:526)
```
### Does this PR introduce _any_ user-facing change?
No. The changes extend the existing proto API.
### How was this patch tested?
By running the new tests:
```
$ build/sbt "test:testOnly *.ArtifactManagerSuite"
$ build/sbt "test:testOnly *.ClientE2ETestSuite"
$ build/sbt "test:testOnly *.ArtifactStatusesHandlerSuite"
```
Closes apache#40827 from MaxGekk/streaming-createDataFrame-2.
Authored-by: Max Gekk <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>1 parent d26292c commit 0d7618a
File tree
23 files changed
+922
-208
lines changed- connector/connect
- client/jvm/src
- main/scala/org/apache/spark/sql
- connect/client
- util
- test/scala/org/apache/spark/sql
- connect/client
- common/src/main/protobuf/spark/connect
- server/src
- main/scala/org/apache/spark/sql/connect
- artifact
- planner
- service
- test/scala/org/apache/spark/sql/connect
- artifact
- service
- core/src/main/scala/org/apache/spark/storage
- python/pyspark/sql/connect/proto
- sql/catalyst/src/main/scala/org/apache/spark/sql/internal
23 files changed
+922
-208
lines changedLines changed: 15 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
119 | 119 | | |
120 | 120 | | |
121 | 121 | | |
122 | | - | |
123 | | - | |
124 | 122 | | |
125 | 123 | | |
126 | | - | |
127 | | - | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
128 | 139 | | |
129 | 140 | | |
130 | 141 | | |
| |||
Lines changed: 41 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
19 | 19 | | |
20 | 20 | | |
21 | 21 | | |
| 22 | + | |
22 | 23 | | |
23 | 24 | | |
24 | 25 | | |
| |||
32 | 33 | | |
33 | 34 | | |
34 | 35 | | |
| 36 | + | |
35 | 37 | | |
36 | 38 | | |
37 | 39 | | |
| |||
42 | 44 | | |
43 | 45 | | |
44 | 46 | | |
| 47 | + | |
| 48 | + | |
45 | 49 | | |
46 | 50 | | |
47 | | - | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
48 | 55 | | |
49 | 56 | | |
50 | 57 | | |
51 | 58 | | |
52 | 59 | | |
| 60 | + | |
53 | 61 | | |
54 | 62 | | |
55 | 63 | | |
| |||
100 | 108 | | |
101 | 109 | | |
102 | 110 | | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
103 | 136 | | |
104 | 137 | | |
105 | 138 | | |
| |||
182 | 215 | | |
183 | 216 | | |
184 | 217 | | |
| 218 | + | |
185 | 219 | | |
186 | 220 | | |
187 | 221 | | |
| |||
236 | 270 | | |
237 | 271 | | |
238 | 272 | | |
| 273 | + | |
239 | 274 | | |
240 | 275 | | |
241 | 276 | | |
| |||
289 | 324 | | |
290 | 325 | | |
291 | 326 | | |
| 327 | + | |
292 | 328 | | |
293 | 329 | | |
294 | 330 | | |
| |||
298 | 334 | | |
299 | 335 | | |
300 | 336 | | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
301 | 341 | | |
302 | 342 | | |
303 | 343 | | |
| |||
Lines changed: 22 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
17 | 17 | | |
18 | 18 | | |
19 | 19 | | |
| 20 | + | |
20 | 21 | | |
21 | 22 | | |
| 23 | + | |
| 24 | + | |
22 | 25 | | |
23 | 26 | | |
24 | 27 | | |
| |||
39 | 42 | | |
40 | 43 | | |
41 | 44 | | |
42 | | - | |
43 | | - | |
44 | 45 | | |
45 | 46 | | |
46 | 47 | | |
47 | 48 | | |
48 | 49 | | |
49 | | - | |
| 50 | + | |
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
53 | 54 | | |
54 | | - | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
55 | 60 | | |
56 | 61 | | |
57 | 62 | | |
| |||
215 | 220 | | |
216 | 221 | | |
217 | 222 | | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
218 | 236 | | |
219 | 237 | | |
220 | 238 | | |
| |||
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
37 | | - | |
| 37 | + | |
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | | - | |
| 43 | + | |
44 | 44 | | |
45 | 45 | | |
46 | 46 | | |
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
67 | | - | |
| 67 | + | |
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| |||
Lines changed: 14 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
34 | 34 | | |
35 | 35 | | |
36 | 36 | | |
| 37 | + | |
37 | 38 | | |
38 | 39 | | |
39 | 40 | | |
| |||
853 | 854 | | |
854 | 855 | | |
855 | 856 | | |
| 857 | + | |
| 858 | + | |
| 859 | + | |
| 860 | + | |
| 861 | + | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
| 866 | + | |
| 867 | + | |
| 868 | + | |
| 869 | + | |
856 | 870 | | |
857 | 871 | | |
858 | 872 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
49 | 49 | | |
50 | 50 | | |
51 | 51 | | |
52 | | - | |
| 52 | + | |
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
| |||
Lines changed: 40 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
542 | 542 | | |
543 | 543 | | |
544 | 544 | | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
545 | 582 | | |
546 | 583 | | |
547 | 584 | | |
| |||
559 | 596 | | |
560 | 597 | | |
561 | 598 | | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
562 | 602 | | |
563 | 603 | | |
Lines changed: 13 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
68 | 68 | | |
69 | 69 | | |
70 | 70 | | |
| 71 | + | |
71 | 72 | | |
72 | 73 | | |
73 | 74 | | |
| |||
381 | 382 | | |
382 | 383 | | |
383 | 384 | | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
384 | 397 | | |
385 | 398 | | |
386 | 399 | | |
| |||
Lines changed: 23 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | | - | |
| 28 | + | |
| 29 | + | |
28 | 30 | | |
29 | 31 | | |
30 | 32 | | |
| |||
87 | 89 | | |
88 | 90 | | |
89 | 91 | | |
90 | | - | |
| 92 | + | |
91 | 93 | | |
92 | 94 | | |
93 | 95 | | |
94 | | - | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
95 | 114 | | |
96 | 115 | | |
97 | 116 | | |
| |||
110 | 129 | | |
111 | 130 | | |
112 | 131 | | |
113 | | - | |
| 132 | + | |
114 | 133 | | |
115 | 134 | | |
116 | 135 | | |
| |||
0 commit comments