-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-52598][DOCS] Reorganize Spark Connect programming guide #51305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
docs/spark-connect-setup.md
Outdated
In a terminal window, go to the `spark` folder in the location where you extracted | ||
Spark before and run the `start-connect-server.sh` script to start Spark server with | ||
Spark Connect, like in this example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add more instructions on how to use it with SPARK_HOME
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll take a crack at that.
Just FYI, these instructions were moved from the existing Spark Connect Overview page.
In a terminal window, go to the `spark` folder in the location where you extracted | ||
Spark before and run the `start-connect-server.sh` script to start Spark server with | ||
Spark Connect. If you already have Spark installed and `SPARK_HOME` defined, you can use that too. | ||
|
||
```bash | ||
cd spark/ | ||
./sbin/start-connect-server.sh | ||
|
||
# alternately | ||
"$SPARK_HOME/sbin/start-connect-server.sh" | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@allisonwang-db - Is this what you were looking for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea not sure if we can point to this doc: https://spark.apache.org/docs/latest/api/python/getting_started/install.html#manually-downloading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind adding that as a link, but I think it's a bit confusing to jump at this point from the main narrative documentation at the root of the site to this parallel set of documentation under api/python/
. I know this is a larger problem with the documentation that we have discussed in the past.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Referencing the PySpark doc from the main spark doc is indeed less ideal. Do we have any other installation docs we can link here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the general installation instructions are on this page.
@grundprinzip - After this reorg of the existing Connect documentation is merged in, I am planning to add a new page with narrative documentation for tools that Spark Connect clients would find useful, as we briefly discussed here. |
What changes were proposed in this pull request?
This PR reorganizes the narrative Spark Connect documentation into a guide that matches the pattern we are already using elsewhere in the docs for the DataFrame API, Structured Streaming, and so forth.
It adds a new entry in the "Programming Guides" dropdown for Spark Connect, and reorganizes the existing two Spark Connect pages into three:
spark-connect-overview.html
spark-connect-setup.html
spark-connect-server-libs.html
This is what the reorganized guide looks like:
Why are the changes needed?
The prose currently in Application Development with Spark Connect is partly repetitive of what's in the overview, and the overview itself a bit longer than necessary because it mixes a genuine introduction to Spark Connect with a technical guide on how to set it up.
With this information reorganized a bit, it should be a bit clearer to map out and follow, and it facilitates adding more narrative Spark Connect documentation since we now have a dedicated guide with its own left sidebar.
In a future PR, I intend to add a new page dedicated to the client side of working with Spark Connect, which will mirror the existing page we have for the server side.
Does this PR introduce any user-facing change?
Documentation only.
How was this patch tested?
I built the docs locally and reviewed them in my browser.
Was this patch authored or co-authored using generative AI tooling?
No.