Skip to content

Handle unavailable MD5 in ES|QL #130158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 3, 2025
Merged

Conversation

tvernum
Copy link
Contributor

@tvernum tvernum commented Jun 27, 2025

In Java 14 the MessageDigest specification was changed so that the "MD5" hash function is no longer required. It is permissible for a JRE to ship without support for MD5 hashes.

This commit modifies the ES|QL MD5 hash function implementation so that the MessageDigest object is no longer loaded on startup, and instead is lazily instantiated when needed.

If an ES|QL query makes use of md5, and it is unavailable, then the query will fail with an ES|QL verification exception

Resolves: #129689

In Java 14 the `MessageDigest` specification was changed so that the
"MD5" hash function is no longer required. It is permissible for a JRE
to ship without support for MD5 hashes.

This commit modifies the ES|QL MD5 hash function implementation so
that the `MessageDigest` object is no longer loaded on startup, and
instead is lazily instantiated when needed.

If an ES|QL query makes use of md5, and it is unavailable, then the
query will fail with an ES|QL verification exception

Resolves: elastic#129689
@tvernum tvernum requested a review from idegtiarenko June 27, 2025 05:46
@tvernum tvernum added >bug auto-backport Automatically create backport pull requests when merged :Analytics/ES|QL AKA ESQL v8.19.0 v9.1.0 v9.1.1 v8.19.1 labels Jun 27, 2025
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0 labels Jun 27, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @tvernum, I've created a changelog YAML for you.

@tvernum
Copy link
Contributor Author

tvernum commented Jun 27, 2025

It's quite tricky to set up a JVM that has sufficient security providers to make ES run, but doesn't have MD5.
I believe it's possible to get that by using RedHat's build of OpenJDK on a FIPS enabled version of RHEL, or alternatively you can hack it (which is what I did)

To save others the trouble of setting that up, this is the behaviour you get (with this PR)

POST /_query
{
  "query": "FROM index-1 | EVAL hash=md5(string) | KEEP string, hash"
}

===
{
  "error": {
    "root_cause": [
      {
        "type": "verification_exception",
        "reason": "function 'md5' is not available on this platform: MD5 MessageDigest not available"
      }
    ],
    "type": "verification_exception",
    "reason": "function 'md5' is not available on this platform: MD5 MessageDigest not available"
  },
  "status": 400
}

Without this PR Elasticsearch doesn't start

[2025-06-27T15:57:02,576][ERROR][o.e.b.Elasticsearch      ] [node01] fatal exception while booting Elasticsearch
java.lang.ExceptionInInitializerError: null
        at org.elasticsearch.xpack.esql.expression.function.scalar.ScalarFunctionWritables.getNamedWriteables(ScalarFunctionWritables.java:90) ~[?:?]
        at org.elasticsearch.xpack.esql.expression.ExpressionWritables.scalars(ExpressionWritables.java:160) ~[?:?]
        at org.elasticsearch.xpack.esql.expression.ExpressionWritables.getNamedWriteables(ExpressionWritables.java:115) ~[?:?]
        at org.elasticsearch.xpack.esql.plugin.EsqlPlugin.getNamedWriteables(EsqlPlugin.java:329) ~[?:?]
...
        at org.elasticsearch.node.Node.<init>(Node.java:185) ~[elasticsearch-9.2.0-SNAPSHOT.jar:?]
        at org.elasticsearch.bootstrap.Elasticsearch$1.<init>(Elasticsearch.java:397) ~[elasticsearch-9.2.0-SNAPSHOT.jar:?]
        at org.elasticsearch.bootstrap.Elasticsearch.initPhase3(Elasticsearch.java:397) ~[elasticsearch-9.2.0-SNAPSHOT.jar:?]
        at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:100) ~[elasticsearch-9.2.0-SNAPSHOT.jar:?]
Caused by: java.lang.IllegalStateException: java.security.NoSuchAlgorithmException: MD5 MessageDigest not available
        at org.elasticsearch.xpack.esql.expression.function.scalar.string.Hash$HashFunction.create(Hash.java:196) ~[?:?]
        at org.elasticsearch.xpack.esql.expression.function.scalar.string.Md5.<clinit>(Md5.java:27) ~[?:?]
        ... 29 more
Caused by: java.security.NoSuchAlgorithmException: MD5 MessageDigest not available
        at sun.security.jca.GetInstance.getInstance(GetInstance.java:159) ~[?:?]
        at java.security.MessageDigest.getInstance(MessageDigest.java:185) ~[?:?]
        at org.elasticsearch.xpack.esql.expression.function.scalar.string.Hash$HashFunction.create(Hash.java:193) ~[?:?]
        at org.elasticsearch.xpack.esql.expression.function.scalar.string.Md5.<clinit>(Md5.java:27) ~[?:?]

I picked VerificationException because it seemed like the closest fit, but I can change it if there's a better option.

I've marked this as >bug with backports to 8.19.0 and 9.1.0 because I know it is affecting people.
However that's arguable because we don't officially support RH OpenJDK in FIPS mode (though we do support it in regular mode) and I don't know of any other JDK that exhibits this behaviour. If you're concerned about the impact we can discuss that classification - but I think it's best to have it fixed in 8.19.0 because it's likely that we'll need to support JDKs without MD5 during the lifetime of 8.19

/**
* As of Java 14, it is permissible for a JRE to ship without the {@code MD5} {@link MessageDigest}.
* We want the "md5" function in ES|QL to fail at runtime on such platforms (rather than at startup)
* so we build the {@link HashFunction} lazily.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the same applicable for SHA functions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably worth mentioning that this might not be available in certain JVMs at

Copy link
Contributor Author

@tvernum tvernum Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, as at Java 24, both SHA-1 and SHA-256 are still required

I could change those functions to work similarly, but I opted for the smallest change that solves the known problem. I actually started with a change in HashFunction that would be lenient for any algorithm, but switched to something MD5 specific.

throw new VerificationException("function 'md5' is not available on this platform: {}", e.getMessage());
}
}
return function;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we do not try to initialize it for every query even if it is not available. May be something like:

    private static final HashFunction MD5 = HashFunction.tryCreate("MD5");

and in Hash.java:

        public static HashFunction tryCreate(String algorithm) {
            try {
                return new HashFunction(algorithm, MessageDigest.getInstance(algorithm));
            } catch (NoSuchAlgorithmException e) {
                logger.warn("[{}] hash fuinction is not available", algorithm);
                return null;
            }
        }

This would also mean we need to modify org.elasticsearch.xpack.esql.expression.function.scalar.string.AbstractHashFunction#toEvaluator to use different (warning/error only) hash evaluator when getHashFunction returns null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we do not try to initialize it for every query even if it is not available.

So would I.
We're lacking an appropriate Result (Maybe) type, that has either a value or an error.
That is what I really wanted to use here, but opted for a lazy reference instead.

I can handle it as you describe, the issue is that we lose the original exception (its only in the logs) and have to treat null as a marker for "not available".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly it worth introducing wrapper that captures exception and re-throws it when executed in new or existing constant evaluator.

Copy link
Contributor

@astefan astefan Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is JVM specific, I am wondering if there are scenarios (like from test | eval hash=hash("md5", message) | stats count(hash) by foo where the stats is a pipeline breaker which sends the plan fragment to each data node) the hash is computed at data node level and if data nodes have different JVMs than that of the coordinator, a lazy approach is actually the correct approach (like what @tvernum is suggesting).

Sorry if I am mentioning something obvious, but thought on voicing it loud rather than just assuming this is already considered.

Copy link
Contributor

@idegtiarenko idegtiarenko Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple from index | eval hash = md5(field) would spread md5 evaluation to each node too.
Since md5 is not going to become available after then node has started there is no need to try to initialize it more than once after the start. In case of nodes running distinct JVMs in this approach we might have nulls and actual hashes from different nodes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple from index | eval hash = md5(field) would spread md5 evaluation to each node too

It doesn't. The hash is computed on the coordinator after each node replies back with 1000 (the default) rows. The coordinator will pick the final 1000 rows and then compute the hash on them.

This is how a physical plan would look like on the coordinator:


EvalExec[[HASH(md5[KEYWORD],field{f}#119) AS hash#115]]
 \_LimitExec[1000[INTEGER],null]
   \_ExchangeExec[[field{f}#119],false]
     \_FragmentExec[fragment=[<>
													Project[[field{f}#119]]
													\_Limit[1000[INTEGER],false]
                                                       \_EsRelation[index][field{f}#119]<>]]

Copy link
Contributor Author

@tvernum tvernum Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced a Result so that we only try to obtain the message digest once (although that's not strictly true, because message digests aren't thread safe, so we actually obtain a new one each time the HashFunction is copied.)

@tvernum tvernum requested a review from a team as a code owner June 30, 2025 12:16
@@ -45,6 +53,27 @@ public void testInvalidAlgorithmLiteral() {
assertThat(e.getMessage(), startsWith("invalid algorithm for [hast(\"invalid\", input)]: invalid MessageDigest not available"));
}

public void testTryCreateUnavailableMd5() throws NoSuchAlgorithmException {
assumeFalse("We run with different security providers in FIPS, and changing them at runtime is more complicated", inFipsJvm());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth having another test with assumeTrue(inFipsJvm()) that verifies MD5 is not available? Or is it not always working like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MD5 is usually available on FIPS, and is available in our FIPS testing environment. It's only in very particular configurations that it isn't available (which is why nothing is failing on main today)

return MD5.get();
} catch (NoSuchAlgorithmException e) {
// Throw a new exception so that the stack trace reflects this call (rather than the static initializer for the MD5 field)
throw new VerificationException("function 'md5' is not available on this platform: {}", e.getMessage());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth adding e as the cause?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VerificationException doesn't take a cause. I can add that if the team thinks it's worth it (but ExceptionUtils.math has the same behaviour I used here)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I guess it is okay to skip it for now

* As of Java 14, it is permissible for a JRE to ship without the {@code MD5} {@link MessageDigest}.
* We want the "md5" function in ES|QL to fail at runtime on such platforms (rather than at startup)
* so we wrap the {@link HashFunction} in a {@link Result}.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we consider an option to emit a warning and supply null instead of md5 hash?
Or is it proffered to fail the entire query instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems to me like a poor place to be lenient - I would definitely argue against it. The behaviour here is roughly the same as what hash("md5", field) or hash("does-not-exist", field) would provide, which seems right to me.

Copy link
Contributor

@idegtiarenko idegtiarenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

@tvernum tvernum merged commit 3bda6af into elastic:main Jul 3, 2025
32 checks passed
tvernum added a commit to tvernum/elasticsearch that referenced this pull request Jul 3, 2025
In Java 14 the `MessageDigest` specification was changed so that the
"MD5" hash function is no longer required. It is permissible for a JRE
to ship without support for MD5 hashes.

This commit modifies the ES|QL MD5 hash function implementation so
that if the specified `MessageDigest` object is not available on startup,
then the error is non-fatal, and the node will still boot successfully.

If an ES|QL query attempts to make use of md5, and it is unavailable,
then the query will fail with an ES|QL verification exception

Resolves: elastic#129689
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.19
9.1

elasticsearchmachine pushed a commit that referenced this pull request Jul 3, 2025
In Java 14 the `MessageDigest` specification was changed so that the
"MD5" hash function is no longer required. It is permissible for a JRE
to ship without support for MD5 hashes.

This commit modifies the ES|QL MD5 hash function implementation so
that if the specified `MessageDigest` object is not available on startup,
then the error is non-fatal, and the node will still boot successfully.

If an ES|QL query attempts to make use of md5, and it is unavailable,
then the query will fail with an ES|QL verification exception

Resolves: #129689
elasticsearchmachine pushed a commit that referenced this pull request Jul 3, 2025
In Java 14 the `MessageDigest` specification was changed so that the
"MD5" hash function is no longer required. It is permissible for a JRE
to ship without support for MD5 hashes.

This commit modifies the ES|QL MD5 hash function implementation so
that if the specified `MessageDigest` object is not available on startup,
then the error is non-fatal, and the node will still boot successfully.

If an ES|QL query attempts to make use of md5, and it is unavailable,
then the query will fail with an ES|QL verification exception

Resolves: #129689
mridula-s109 pushed a commit to mridula-s109/elasticsearch that referenced this pull request Jul 3, 2025
In Java 14 the `MessageDigest` specification was changed so that the
"MD5" hash function is no longer required. It is permissible for a JRE
to ship without support for MD5 hashes.

This commit modifies the ES|QL MD5 hash function implementation so
that if the specified `MessageDigest` object is not available on startup,
then the error is non-fatal, and the node will still boot successfully.

If an ES|QL query attempts to make use of md5, and it is unavailable,
then the query will fail with an ES|QL verification exception

Resolves: elastic#129689
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v8.19.1 v9.1.0 v9.1.1 v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't enable ES|QL MD5 function in FIPS mode
4 participants