Bug #21654
openSet#new calls extra methods compared to previous versions
Description
I'm trying to test Ruby 3.5.0 with our Rails application and we've found that Set.new is now causing extra database queries to happen.
The changes in d4020dd5faf call "size" on enumerable objects that are passed to the new method, and this causes extra "COUNT" queries to happen with ActiveRecord associations.
For example:
Set.new(some_activerecord_association)
Previously, the above code would only do one query by iterating over the association. Now it issues two queries, a count query, and then the normal query for results.
Since d4020dd5faf is dealing with endless ranges, I would like to narrow the scope from all Enumerable objects to just Ranges. Unfortunately, I noticed we have a test like this:
assert_raise(ArgumentError) {
Set.new(1.upto(Float::INFINITY))
}
I'm not sure how we can handle such a case without testing size.
Updated by k0kubun (Takashi Kokubun) 15 days ago
How about handling only Range and Enumerator (not Enumerable) for now? Avoiding an extra DB query on ActiveRecord relations seems like a more important use case than preventing user-defined Enumerable with an infinite length from hanging, unless we already know of an existing use case for it. I think infinite-sized Enumerable classes are often implemented as an Enumerator, so it might still work for such cases too.
Updated by Dan0042 (Daniel DeLorme) 15 days ago
It seems to me that's an argument in favor of #17924 Range#infinite?
Updated by mame (Yusuke Endoh) 15 days ago
- Related to Bug #21513: Converting endless range to set hangs added
Updated by tenderlovemaking (Aaron Patterson) 15 days ago
k0kubun (Takashi Kokubun) wrote in #note-1:
How about handling only
RangeandEnumerator(notEnumerable) for now? Avoiding an extra DB query on ActiveRecord relations seems like a more important use case than preventing user-definedEnumerablewith an infinite length from hanging, unless we already know of an existing use case for it. I think infinite-sizedEnumerableclasses are often implemented as anEnumerator, so it might still work for such cases too.
I think that makes sense. I've got a patch mostly prepared, so I'll submit it.
Updated by mame (Yusuke Endoh) 15 days ago
How about handling only
RangeandEnumerator(notEnumerable) for now?
I think it would be better to handle only Range for now, and not Enumerator either. See https://bugs.ruby-lang.org/issues/21513#note-10
Updated by tenderlovemaking (Aaron Patterson) 15 days ago
mame (Yusuke Endoh) wrote in #note-5:
How about handling only
RangeandEnumerator(notEnumerable) for now?I think it would be better to handle only Range for now, and not Enumerator either. See https://bugs.ruby-lang.org/issues/21513#note-10
I sent a pull request that only handles Range for now: https://github.com/ruby/ruby/pull/14990
This fixes the issues we're seeing in tests.
Updated by Dan0042 (Daniel DeLorme) 14 days ago
@mame (Yusuke Endoh), this PR causes Set.new(1..1/0.0) to hang rather than raise an error. The unit test for this was removed: https://github.com/ruby/ruby/pull/14990/files#diff-841852fda8de5c29b86810572be72d6e29cda4a2f3d179c63b77fb6fb06d09dfL91-L93
What that really what you meant by "checking if Range#end is nil is good enough" ?
Updated by ufuk (Ufuk Kayserilioglu) 14 days ago
IMO, ranges should behave the same when end == Float::INFINITY and when end == nil. They are both practically endless ranges, but nothing normalizes them so that they always behave the same. Maybe (1..1/0.0) should be internally treated the same as (1..) and have end == nil?
Regardless, I think that is a problem for the Range class to solve, so that other classes don't have to check for nil and infinity and whatever else separately.
Updated by mame (Yusuke Endoh) 14 days ago
Dan0042 (Daniel DeLorme) wrote in #note-7:
What that really what you meant by "checking if Range#end is nil is good enough" ?
Yes. The original issue (#21513) was about the consistency between (1..).to_a and (1..).to_set. Since (1..1/0.0).to_a hangs, I think it's fine if (1..1/0.0).to_set also hangs.
Admittedly, I'm biased: I generally feel that proactively raising exceptions for endless range operations is unnecessary. If an operation hangs, let it hang.
Should we ask @matz (Yukihiro Matsumoto) to decide?
Updated by tenderlovemaking (Aaron Patterson) 14 days ago
mame (Yusuke Endoh) wrote in #note-9:
Dan0042 (Daniel DeLorme) wrote in #note-7:
What that really what you meant by "checking if Range#end is nil is good enough" ?
Yes. The original issue (#21513) was about the consistency between
(1..).to_aand(1..).to_set. Since(1..1/0.0).to_ahangs, I think it's fine if(1..1/0.0).to_setalso hangs.Admittedly, I'm biased: I generally feel that proactively raising exceptions for endless range operations is unnecessary. If an operation hangs, let it hang.
I personally agree with this. If I write an infinite loop, I expect it to loop infinitely, even if I wrote the infinite loop by mistake. Calling to_a or to_set on an infinite range is iterating infinitely.
Should we ask @matz (Yukihiro Matsumoto) to decide?
If it's necessary. Is it necessary? 😅
Updated by jeremyevans0 (Jeremy Evans) 14 days ago
tenderlovemaking (Aaron Patterson) wrote in #note-10:
mame (Yusuke Endoh) wrote in #note-9:
Dan0042 (Daniel DeLorme) wrote in #note-7:
What that really what you meant by "checking if Range#end is nil is good enough" ?
Yes. The original issue (#21513) was about the consistency between
(1..).to_aand(1..).to_set. Since(1..1/0.0).to_ahangs, I think it's fine if(1..1/0.0).to_setalso hangs.Admittedly, I'm biased: I generally feel that proactively raising exceptions for endless range operations is unnecessary. If an operation hangs, let it hang.
I personally agree with this. If I write an infinite loop, I expect it to loop infinitely, even if I wrote the infinite loop by mistake. Calling
to_aorto_seton an infinite range is iterating infinitely.
I agree with @mame (Yusuke Endoh) and @tenderlovemaking (Aaron Patterson). Removing special handling of infinite ranges avoids the original issue (calling size), as well as avoiding the nil vs Infinity range end issue. I think we should just revert d4020dd5faf28486123853e7f00c36139fc07793.
Updated by Dan0042 (Daniel DeLorme) 13 days ago
· Edited
If an operation hangs, let it hang.
I don't agree with that. If an operation crashes, yes, let it crash. But if it hangs, that's an order of magnitude or two harder to debug. Even more so if the "hang" is accompanied by unrestrained memory allocation, which can then bring out the OOM and/or crash the whole system. So if it's at all possible to raise early, then please let's do so.
Updated by mame (Yusuke Endoh) 13 days ago
I understand your point, but I personally disagree.
My concern is that the expectation to "if it's at all possible to raise early" is a slippery slope with no clear boundary.
For example, an operation like (1..1<<100).to_a will also effectively hang (or crash with an OOM) on any modern computer. This seems to fit the "unrestrained memory allocation" case you mentioned. Should we also preemptively raise an exception for this? If we do, what should the threshold be?
That said, I recall @matz (Yukihiro Matsumoto) expressing a similar concern to yours. I would like to hear his opinion on this matter.
Updated by tenderlovemaking (Aaron Patterson) 13 days ago
mame (Yusuke Endoh) wrote in #note-13:
I understand your point, but I personally disagree.
My concern is that the expectation to "if it's at all possible to raise early" is a slippery slope with no clear boundary.
For example, an operation like
(1..1<<100).to_awill also effectively hang (or crash with an OOM) on any modern computer. This seems to fit the "unrestrained memory allocation" case you mentioned. Should we also preemptively raise an exception for this? If we do, what should the threshold be?
💯
That said, I recall @matz (Yukihiro Matsumoto) expressing a similar concern to yours. I would like to hear his opinion on this matter.
I'll add this to the dev meeting. I think we should just revert d4020dd5faf28486123853e7f00c36139fc07793
Updated by Dan0042 (Daniel DeLorme) 12 days ago
· Edited
mame (Yusuke Endoh) wrote in #note-13:
My concern is that the expectation to "if it's at all possible to raise early" is a slippery slope with no clear boundary.
I don't think it's slippery at all. There's a very clear difference between "infinite" and "very large but finite". I didn't say "if it's at all possible to raise early" in any situation, this is specifically about infinity. If we were in a situation where it's hard to determine if something will run infinitely, then sure there's no helping it. But I don't think that's the case here.
tenderlovemaking (Aaron Patterson) wrote in #note-10:
If I write an infinite loop, I expect it to loop infinitely, even if I wrote the infinite loop by mistake. Calling
to_aorto_seton an infinite range is iterating infinitely.
If I write something like loop{ ary << 42 } I also expect it to loop infinitely and consume all memory. Because I wrote that loop. But in the case of infinite_range.to_a I did not write any loop. The loop happens in ruby internally. I expect ruby to perform reasonable validation on its inputs. This should raise an exception for the same reason that 42 / 0 raises ZeroDivisionError and not segfault.
Updated by tenderlovemaking (Aaron Patterson) 9 days ago
In any case, I think we should remove the code from Set.new. The original ticket was about the behavior of Range#to_set vs Range#to_a, not about Set.new. At least Set.new's behavior should be restored.
Updated by knu (Akinori MUSHA) 2 days ago
I agree.
- The implementation of Set.new can be naïve, that is, it shouldn't perform size checks or similar validations.
- Enumerable#to_set can simply delegate everything to Set.new as it is now.
- Adding Range#to_set and Enumerator#to_set overrides that perform size checks would help users avoid most common pitfalls.