Skip to content

Quiet "Unable to load native-hadoop library for your platform" message #51136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sryza
Copy link
Contributor

@sryza sryza commented Jun 9, 2025

What changes were proposed in this pull request?

This PR proposes suppressing this warning that gets logged at startup if native-hadoop libraries aren't installed:

25/06/09 10:16:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Why are the changes needed?

This warning (among others) provides a noisy experience to users getting started with Spark locally, who aren'y likely to care about this.

$ bin/pyspark
Python 3.9.21 (main, Apr 24 2025, 15:50:57)
[Clang 16.0.0 (clang-1600.0.26.6)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/06/09 10:16:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Ran bin/pyspark after making the change and observed that the message was not shown.

Was this patch authored or co-authored using generative AI tooling?

No

@LuciferYang
Copy link
Contributor

hmm... I'm not sure whether this would mislead existing users into incorrectly thinking that the loading of the native-hadoop library has succeeded....

@sryza
Copy link
Contributor Author

sryza commented Jun 9, 2025

@LuciferYang – the thing that I would think about here is: of all the information that's useful to show users when they launch their pyspark shell, is this among the most important? Imagine this message wasn't present: would we want to add it? I don't have hard data on this, but I would guess that a majority of users haven't heard of native Hadoop, and many that know what it is don't care.

This message will be one of the first thing that newcomers to Spark see, which is not a great experience.

I'm not sure if the change proposed in this PR is the right solution – I'm curious if you have ideas about whether there are ways to address this but make this information accessible for more advanced users who might care about it?

@pan3793
Copy link
Member

pan3793 commented Jun 10, 2025

This is important for us to identify if Hadoop native libraries are loaded properly, because without Hadoop native libraries, some Java/Shell fallback is quite slow, and some compression codecs are just broken.

@sryza
Copy link
Contributor Author

sryza commented Jun 10, 2025

Compared to newer frameworks like DuckDB and dbt, Spark has a reputation for being relatively kludgy, and I think these kinds of log messages contribute to that. I wonder if there are ways we can make this information available to experts who need it, but not make it so prominent for people who are just trying out or developing with Spark on their laptop?

@pan3793
Copy link
Member

pan3793 commented Jun 11, 2025

How about suppressing this warning only in REPL?

Then I will not be surprised because REPL sets the default log level to WARN and swallows many useful diagnosis logs compared to normal jobs run via spark-submit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants