Skip to content

Speedup equals #126394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 9, 2025
Merged

Speedup equals #126394

merged 6 commits into from
Apr 9, 2025

Conversation

idegtiarenko
Copy link
Contributor

@idegtiarenko idegtiarenko commented Apr 7, 2025

We spend ~7-8% of time computing equals during tree traversal.
If my understanding correct, we mostly return the same sub-tree/attributes. However our equals checks are structured in a way that reaching to this == obj require multiple super.equals calls. Even if that check returns true, we are still comparing fields in sub-classes (this might get quiet expensive with string equality check).

This PR restructure equals to always check reference equality first, before calling super.
I believe this should become cheaper. We might also consider adding equalsByFields that skip reference and class check and call it from the sub-classes to avoid duplicating those in every equals check.

image
^^ equals calls are highlighted in pink

@idegtiarenko idegtiarenko added >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v9.1.0 labels Apr 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change can make sense.

But it increases complexity, adding risk to very fundamental methods of important classes; we had bugs from wrong equals implementations before, and they can be tricky to find.

Therefore, before merging it, I'd like to:

  • Know how much faster this makes us in some queries, and ideally why.
  • Add javadoc/comments so that future us know why the code is the way it is.
  • Check if we can factor duplicated code into helper methods without losing (a lot of) the performance gains.

if (super.equals(obj)) {
Attribute other = (Attribute) obj;
return Objects.equals(nullability, other.nullability);
if (this == obj) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this change.

The checks that were added to Attribute.equals are already performed in super.equals, and in the same order. Now we're performing these checks twice, and this equals method has become confusing to read.

How does this contribute to a speedup? Is there lower cost from megamorphic calls with this change? (If needed, this can be confirmed by running a small microbench with the async profiler and having it print the machine code for hot parts of the code.) How much faster do we get with this change? I.e., is it worth it to make our equals methods more complicated? Unless the speedup is noticeable, making equals methods complex is risky and we had bugs in those before.

If the megamorphic calls are causing slowdowns and this makes it noticeably better, I think we should:

  • a) Add javadoc to NamedExpression.java for warning everyone that they shouldn't just use super.equals as the first thing they call because of the performance cost of megamorphic calls, and they should rather perform all cheap equality check before that, and
  • b) add a method NamedExpression.equalsByFields and call it from here to avoid redundancy, like you suggested.

The latter is also important to make the intent of this code clear - currently, it appears like the code is redundant due to technical debt. (Which is a sane assumption, given that we inherited this code initially from the ql package and many redundancies actually just haven't been cleaned up, yet.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there lower cost from megamorphic calls with this change?

I believe these are the rule for if you pay the megamorphic cost is "just": did you invokevirtual on a method with more than two overrides loaded by the jvm? I'll bet it's more complex than that - but that's a simple rule you can hold in your head.

UnsupportedAttribute a, b;   a.equal(b);   NO - concrete subclass
FieldAttribute a, b;   a.equal(b);   NO - only one subclass - it's pretty fast
Attribute a, b;   a.equal(b);   YES - many subclasses

For what it's worth, I think we might have more luck with the kind of lame way of writing equals:

Node {
  final boolean equals(Object other) {
    if(this == other) {return true;}
    if(false == getClass().equals(other.getClass())) { return false; }
    return nodeEquals(other) && children.equals(other.children)
  }
  abstract boolean nodeEquals(Object other);
}
Expression {
  @Override final boolean nodeEquals(Object other) {
    return whatever && expressionEquals(other);
  }
  abstract boolean expressionEquals(Object other);

This would force the == and getClass paths super early. No dynamic invocations to get to them because the root method is final. OTOH, you'd end up with a chain of virtual invocations. On the other other hand it's easier to maintain the list of comparisons here.

On the other other other other other other hand - we could get a lot of benefit from doing the == comparison in the transform methods and skipping .equals if they are ==.

I dunno. It'd take some fiddling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this offline, short summary:

Is there lower cost from megamorphic calls with this change?

Correct, I expect this part to become a bit cheaper.

The other part that contribute to the cost of equals is that even if this == obj is true and checked in parent equals, with current structure we still compare child object fields. This might get expensive when comparing strings or other non primitive types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline with @idegtiarenko : it's potentially not even about megamorphic calls, but about the fact that calling super.equals(...) doesn't currently short circuit and immediately return true if the objects are identical.

However, identical objects is something we encounter all the time when optimizer rules try to make changes that don't apply to the current node, e.g. here.

Comment on lines 177 to 180
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass() || super.equals(o) == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we encapsulate this repeated part (it's the same for all inheritors of NamedExpression) in a (static?) method, or would that kill any gains that we got from inlining this?

}
return false;
public int hashCode() {
return Objects.hash(super.hashCode(), message, hasCustomMessage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we flip the order of message and hasCustomMessage?

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, the gist of this change makes a lot of sense, namely the short-circuiting of equals in case of identical objects; I think this improvement is obvious enough that it doesn't require a micro bench to demonstrate the effect, as long as we can comment this well, and refactor this a little to cut down on redundancy.

if (super.equals(obj)) {
Attribute other = (Attribute) obj;
return Objects.equals(nullability, other.nullability);
if (this == obj) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline with @idegtiarenko : it's potentially not even about megamorphic calls, but about the fact that calling super.equals(...) doesn't currently short circuit and immediately return true if the objects are identical.

However, identical objects is something we encounter all the time when optimizer rules try to make changes that don't apply to the current node, e.g. here.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now and makes an obvious perf improvement without sacrificing maintainability IMO. I like it, thanks @idegtiarenko !

There's a minor change in UnresolvedAttribute.equals that should be fixed before merging, but otherwise this LGTM!

@@ -61,15 +61,25 @@ public int hashCode() {
return Objects.hash(super.hashCode(), name, synthetic);
}

/**
* Polymorphic equality is a pain.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's mention that we consider performance, and especially short circuiting in case of object equality, because our tree traversal methods usually try to return the same object when there is no change needed.

/**
* Polymorphic equality is a pain.
* This equals shortcuts `this == o` and type checks.
* Here equals is final to ensure we are not duplication those checks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Here equals is final to ensure we are not duplication those checks.
* Here equals is final to ensure we are not duplicating those checks.

@idegtiarenko idegtiarenko merged commit e95397c into elastic:main Apr 9, 2025
17 checks passed
@idegtiarenko idegtiarenko deleted the speedup_equals branch April 9, 2025 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants