Quiet "Unable to load native-hadoop library for your platform" message #51136

sryza · 2025-06-09T17:17:01Z

What changes were proposed in this pull request?

This PR proposes suppressing this warning that gets logged at startup if native-hadoop libraries aren't installed:

25/06/09 10:16:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Why are the changes needed?

This warning (among others) provides a noisy experience to users getting started with Spark locally, who aren'y likely to care about this.

$ bin/pyspark
Python 3.9.21 (main, Apr 24 2025, 15:50:57)
[Clang 16.0.0 (clang-1600.0.26.6)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
WARNING: Using incubator modules: jdk.incubator.vector
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/06/09 10:16:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Does this PR introduce any user-facing change?

Yes

How was this patch tested?

Ran bin/pyspark after making the change and observed that the message was not shown.

Was this patch authored or co-authored using generative AI tooling?

No

LuciferYang · 2025-06-09T18:10:29Z

hmm... I'm not sure whether this would mislead existing users into incorrectly thinking that the loading of the native-hadoop library has succeeded....

sryza · 2025-06-09T20:20:29Z

@LuciferYang – the thing that I would think about here is: of all the information that's useful to show users when they launch their pyspark shell, is this among the most important? Imagine this message wasn't present: would we want to add it? I don't have hard data on this, but I would guess that a majority of users haven't heard of native Hadoop, and many that know what it is don't care.

This message will be one of the first thing that newcomers to Spark see, which is not a great experience.

I'm not sure if the change proposed in this PR is the right solution – I'm curious if you have ideas about whether there are ways to address this but make this information accessible for more advanced users who might care about it?

pan3793 · 2025-06-10T03:48:52Z

This is important for us to identify if Hadoop native libraries are loaded properly, because without Hadoop native libraries, some Java/Shell fallback is quite slow, and some compression codecs are just broken.

sryza · 2025-06-10T15:08:25Z

Compared to newer frameworks like DuckDB and dbt, Spark has a reputation for being relatively kludgy, and I think these kinds of log messages contribute to that. I wonder if there are ways we can make this information available to experts who need it, but not make it so prominent for people who are just trying out or developing with Spark on their laptop?

pan3793 · 2025-06-11T12:42:44Z

How about suppressing this warning only in REPL?

Then I will not be surprised because REPL sets the default log level to WARN and swallows many useful diagnosis logs compared to normal jobs run via spark-submit.

Quiet "Unable to load native-hadoop library for your platform" message

d234c9e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quiet "Unable to load native-hadoop library for your platform" message #51136

Quiet "Unable to load native-hadoop library for your platform" message #51136

sryza commented Jun 9, 2025 •

edited

Loading

Uh oh!

LuciferYang commented Jun 9, 2025

Uh oh!

sryza commented Jun 9, 2025

Uh oh!

pan3793 commented Jun 10, 2025

Uh oh!

sryza commented Jun 10, 2025

Uh oh!

pan3793 commented Jun 11, 2025

Uh oh!

Uh oh!

Quiet "Unable to load native-hadoop library for your platform" message #51136

Are you sure you want to change the base?

Quiet "Unable to load native-hadoop library for your platform" message #51136

Conversation

sryza commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

LuciferYang commented Jun 9, 2025

Uh oh!

sryza commented Jun 9, 2025

Uh oh!

pan3793 commented Jun 10, 2025

Uh oh!

sryza commented Jun 10, 2025

Uh oh!

pan3793 commented Jun 11, 2025

Uh oh!

Uh oh!

sryza commented Jun 9, 2025 •

edited

Loading