Skip to content

MONGOID-5391 - Support Criteria#pluck_each method #5332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

johnnyshields
Copy link
Contributor

@johnnyshields johnnyshields commented Jun 19, 2022

Resolves MONGOID-5391

This PR extracts out some code into Mongoid::Contextual::Mongo::PluckEnumerator. This class is used for both #pluck and #pluck_each (#pluck is simply pluck_each(fields).to_a.) There has been no change to the logic of #pluck; all existing #pluck cases pass without modification.

@johnnyshields johnnyshields changed the title Add #pluck_each, extract out PluckEnumerator Add #pluck_each to Contextual Jun 19, 2022
@johnnyshields johnnyshields changed the title Add #pluck_each to Contextual Draft: Add #pluck_each to Contextual Jun 19, 2022
@johnnyshields johnnyshields marked this pull request as draft June 19, 2022 10:20
@johnnyshields johnnyshields changed the title Draft: Add #pluck_each to Contextual MONGOID-5391 - Add #pluck_each to Contextual Jun 19, 2022
@johnnyshields
Copy link
Contributor Author

johnnyshields commented Jun 19, 2022

This feature works brilliantly. So fast! 🚀

@johnnyshields johnnyshields marked this pull request as ready for review June 21, 2022 04:54
@johnnyshields
Copy link
Contributor Author

This is ready for review.

@johnnyshields johnnyshields changed the title MONGOID-5391 - Add #pluck_each to Contextual MONGOID-5391 - Support Criteria#pluck_each method Jun 21, 2022
@johnnyshields
Copy link
Contributor Author

johnnyshields commented Jun 25, 2022

@neilshweky please review this PR, think it will be useful for your work on tally to sync up our implementations. It also moves some code you wrote in an earlier PR.

database_field_names
else
@fields.map {|f| @klass.cleanse_localized_field_names(f) }
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your implementation of normalized_field_names differs slightly to the one I had previously.

Can you add a test in which we pluck both a localized field, and it's _transaltions hash?

For example:

Dictionary.pluck(:description, :description_translations)

and verify that this new implementation correctly returns

[["t", {"en" => "t", "de" => "u" }]]

for current locale being :en

Copy link
Contributor Author

@johnnyshields johnnyshields Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neilshweky this test is already in this PR - criteria_spec.rb line 2619 for both #pluck and #pluck_each, legacy and non-legacy cases (I copied all the tests you wrote for #pluck.)

          it 'returns both' do
            expect(plucked_translations_both.first).to eq([{"de"=>"deutsch-text", "en"=>"english-text"}, "deutsch-text"])
          end

Github collapses the diff for criteria_spec.rb because it is large so you may have missed it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay great!

Copy link
Contributor

@neilshweky neilshweky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please answer my comments regarding the noramlzied_field_names and add a test?

@johnnyshields
Copy link
Contributor Author

johnnyshields commented Jun 28, 2022

@neilshweky I responded.

@neilshweky neilshweky requested a review from p-mongo June 28, 2022 15:23
@johnnyshields
Copy link
Contributor Author

@p-mongo this PR is ready for review

@p-mongo
Copy link
Contributor

p-mongo commented Jul 19, 2022

This is an optimization for a narrow use case that does not need to exist in Mongoid. To iterate result set in batches you should use the normal .where etc. calls that Mongoid provides.

@johnnyshields
Copy link
Contributor Author

johnnyshields commented Jul 19, 2022

@p-mongo #pluck_each(*fields) greatly reduces the number of object allocations when iterating over large data sets vs. the current alternative (.only(*fields).each). I am seeing it speed up my in-Ruby-memory data analysis jobs by ~10x or more. I'm already using this technique in production in dozens of reports in my app, so I don't consider it a narrow use case at all--many users will benefit.

(I made this PR to "scratch a real-world itch" re: performance of standard queries)

Given the benefit, this refactor has a pretty low "complexity" cost; extracting out PluckEnumerator actually makes the code simpler (less code in the mixin, more code encapsulated in a class.)

@p-mongo
Copy link
Contributor

p-mongo commented Jul 19, 2022

To not construct Mongoid model instances when issuing queries from Mongoid we have https://jira.mongodb.org/browse/MONGOID-5411. If your object allocations are coming from somewhere else please elaborate.

@johnnyshields
Copy link
Contributor Author

@p-mongo I believe MONGOID-5411 and this PR serve similar, but slightly different purposes.

Suppose I have a model User with 100 different fields and some embeds (in the below examples "addresses" will be an embedded field.)

In MONGOID-5411, I'm assuming it adds a criteria modifier (lets call it .as_hashes) which causes the result to be returned as hashes rather than Mongoid::Document instances. Presumably this will work with .only and .each, so I can do:

User.where(name: "Ryan").only(:name, :rank, :addresses).as_hashes.each {|result| puts result }
#=> [ { name: "Ryan", rank: "Private", addresses: [{ street: "Elm Street", city: "Springwood" }] },
#     { name: "Ryan", rank: "Captain", addresses: [] },
#     { name: "Ryan", rank: "Major", addresses: [{ street: "Evergreen Terrace", city: "Springfield" }] } ]

(Please correct me if this is not how you're intending MONGOID-5411 to work)

#pluck_each, like #pluck, can actually do even fewer allocations, especially when working with embedded docs where you can use dot-notation path traversal:

User.where(name: "Ryan").pluck_each(:name, :rank, "addresses.street") {|result| puts result }
#=> [ "Ryan", "Private", ["Elm Street"] ] },
#     "Ryan", "Captain", [] },
#     "Ryan", "Major", ["Evergreen Terrace"] ]

Array creation is about twice the speed of Hash creation with lower memory footprint. #pluck_each is almost certainly to be lighter-weight than MONGOID-5411.

In summary, I think both #pluck_each and MONGOID-5411 are useful and not mutually exclusive. It's good for users to have a choose here, just like pluck vs. only today.

@p-mongo
Copy link
Contributor

p-mongo commented Jul 20, 2022

  1. only supports dot notation.
  2. pluck takes the hashes as they come out of the driver and takes values/makes arrays out of them. It is more expensive than returning the hashes as they are.

@johnnyshields
Copy link
Contributor Author

johnnyshields commented Jul 20, 2022

@p-mongo that sounds reasonable. If MONGOID-5411:

  • supports each (batch iteration with GETMORE)
  • supports only (trims docs) including dot-notation
  • is faster/more memory efficient in cases above than #pluck_each

Then there wouldn't be any need for this PR.

(I do think the refactor here makes the code cleaner regardless for #pluck)

@p-mongo
Copy link
Contributor

p-mongo commented Jul 21, 2022

Closing per the above comments. I extracted pluck-related changes to #5399.

@p-mongo p-mongo closed this Jul 21, 2022
@johnnyshields
Copy link
Contributor Author

Fine to close. I do think this is still useful to have (syntax is more terse than what is being proposed in MONGOID-5411) but can revisit later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants