Skip to content

[SPARK-52598][DOCS] Reorganize Spark Connect programming guide #51305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

nchammas
Copy link
Contributor

@nchammas nchammas commented Jun 27, 2025

What changes were proposed in this pull request?

This PR reorganizes the narrative Spark Connect documentation into a guide that matches the pattern we are already using elsewhere in the docs for the DataFrame API, Structured Streaming, and so forth.

It adds a new entry in the "Programming Guides" dropdown for Spark Connect, and reorganizes the existing two Spark Connect pages into three:

  • Spark Connect Guide: spark-connect-overview.html
  • Setting up Spark Connect: spark-connect-setup.html
  • Extending Spark with Spark Server Libraries: spark-connect-server-libs.html

This is what the reorganized guide looks like:

Why are the changes needed?

The prose currently in Application Development with Spark Connect is partly repetitive of what's in the overview, and the overview itself a bit longer than necessary because it mixes a genuine introduction to Spark Connect with a technical guide on how to set it up.

With this information reorganized a bit, it should be a bit clearer to map out and follow, and it facilitates adding more narrative Spark Connect documentation since we now have a dedicated guide with its own left sidebar.

In a future PR, I intend to add a new page dedicated to the client side of working with Spark Connect, which will mirror the existing page we have for the server side.

Does this PR introduce any user-facing change?

Documentation only.

How was this patch tested?

I built the docs locally and reviewed them in my browser.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the DOCS label Jun 27, 2025
@nchammas nchammas changed the title [DOCS] Reorganize Spark Connect programming guide [SPARK-52598][DOCS] Reorganize Spark Connect programming guide Jun 27, 2025
@nchammas
Copy link
Contributor Author

@nchammas nchammas marked this pull request as ready for review June 27, 2025 15:56
Comment on lines 41 to 43
In a terminal window, go to the `spark` folder in the location where you extracted
Spark before and run the `start-connect-server.sh` script to start Spark server with
Spark Connect, like in this example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add more instructions on how to use it with SPARK_HOME?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll take a crack at that.

Just FYI, these instructions were moved from the existing Spark Connect Overview page.

Comment on lines +41 to +51
In a terminal window, go to the `spark` folder in the location where you extracted
Spark before and run the `start-connect-server.sh` script to start Spark server with
Spark Connect. If you already have Spark installed and `SPARK_HOME` defined, you can use that too.

```bash
cd spark/
./sbin/start-connect-server.sh

# alternately
"$SPARK_HOME/sbin/start-connect-server.sh"
```
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@allisonwang-db - Is this what you were looking for?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@nchammas nchammas Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind adding that as a link, but I think it's a bit confusing to jump at this point from the main narrative documentation at the root of the site to this parallel set of documentation under api/python/. I know this is a larger problem with the documentation that we have discussed in the past.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referencing the PySpark doc from the main spark doc is indeed less ideal. Do we have any other installation docs we can link here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the general installation instructions are on this page.

@nchammas
Copy link
Contributor Author

nchammas commented Jul 3, 2025

@grundprinzip - After this reorg of the existing Connect documentation is merged in, I am planning to add a new page with narrative documentation for tools that Spark Connect clients would find useful, as we briefly discussed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants