Project

General

Profile

Actions

Bug #21654

open

Set#new calls extra methods compared to previous versions

Bug #21654: Set#new calls extra methods compared to previous versions

Added by tenderlovemaking (Aaron Patterson) 15 days ago. Updated 2 days ago.

Status:
Open
Assignee:
-
Target version:
-
ruby -v:
ruby 3.5.0dev (2025-10-24T15:50:47Z master a9f24aaccb) +PRISM [arm64-darwin25]
[ruby-core:123576]

Description

I'm trying to test Ruby 3.5.0 with our Rails application and we've found that Set.new is now causing extra database queries to happen.

The changes in d4020dd5faf call "size" on enumerable objects that are passed to the new method, and this causes extra "COUNT" queries to happen with ActiveRecord associations.

For example:

Set.new(some_activerecord_association)

Previously, the above code would only do one query by iterating over the association. Now it issues two queries, a count query, and then the normal query for results.

Since d4020dd5faf is dealing with endless ranges, I would like to narrow the scope from all Enumerable objects to just Ranges. Unfortunately, I noticed we have a test like this:

    assert_raise(ArgumentError) {
      Set.new(1.upto(Float::INFINITY))
    }

I'm not sure how we can handle such a case without testing size.


Related issues 1 (1 open0 closed)

Related to Ruby - Bug #21513: Converting endless range to set hangsOpenknu (Akinori MUSHA)Actions

Updated by k0kubun (Takashi Kokubun) 15 days ago Actions #1 [ruby-core:123577]

How about handling only Range and Enumerator (not Enumerable) for now? Avoiding an extra DB query on ActiveRecord relations seems like a more important use case than preventing user-defined Enumerable with an infinite length from hanging, unless we already know of an existing use case for it. I think infinite-sized Enumerable classes are often implemented as an Enumerator, so it might still work for such cases too.

Updated by Dan0042 (Daniel DeLorme) 15 days ago Actions #2 [ruby-core:123578]

It seems to me that's an argument in favor of #17924 Range#infinite?

Updated by mame (Yusuke Endoh) 15 days ago Actions #3

  • Related to Bug #21513: Converting endless range to set hangs added

Updated by tenderlovemaking (Aaron Patterson) 15 days ago Actions #4 [ruby-core:123579]

k0kubun (Takashi Kokubun) wrote in #note-1:

How about handling only Range and Enumerator (not Enumerable) for now? Avoiding an extra DB query on ActiveRecord relations seems like a more important use case than preventing user-defined Enumerable with an infinite length from hanging, unless we already know of an existing use case for it. I think infinite-sized Enumerable classes are often implemented as an Enumerator, so it might still work for such cases too.

I think that makes sense. I've got a patch mostly prepared, so I'll submit it.

Updated by mame (Yusuke Endoh) 15 days ago 1Actions #5 [ruby-core:123581]

 How about handling only Range and Enumerator (not Enumerable) for now?

I think it would be better to handle only Range for now, and not Enumerator either. See https://bugs.ruby-lang.org/issues/21513#note-10

Updated by tenderlovemaking (Aaron Patterson) 15 days ago Actions #6 [ruby-core:123603]

mame (Yusuke Endoh) wrote in #note-5:

 How about handling only Range and Enumerator (not Enumerable) for now?

I think it would be better to handle only Range for now, and not Enumerator either. See https://bugs.ruby-lang.org/issues/21513#note-10

I sent a pull request that only handles Range for now: https://github.com/ruby/ruby/pull/14990

This fixes the issues we're seeing in tests.

Updated by Dan0042 (Daniel DeLorme) 14 days ago Actions #7 [ruby-core:123613]

@mame (Yusuke Endoh), this PR causes Set.new(1..1/0.0) to hang rather than raise an error. The unit test for this was removed: https://github.com/ruby/ruby/pull/14990/files#diff-841852fda8de5c29b86810572be72d6e29cda4a2f3d179c63b77fb6fb06d09dfL91-L93

What that really what you meant by "checking if Range#end is nil is good enough" ?

Updated by ufuk (Ufuk Kayserilioglu) 14 days ago Actions #8 [ruby-core:123614]

IMO, ranges should behave the same when end == Float::INFINITY and when end == nil. They are both practically endless ranges, but nothing normalizes them so that they always behave the same. Maybe (1..1/0.0) should be internally treated the same as (1..) and have end == nil?

Regardless, I think that is a problem for the Range class to solve, so that other classes don't have to check for nil and infinity and whatever else separately.

Updated by mame (Yusuke Endoh) 14 days ago Actions #9 [ruby-core:123615]

Dan0042 (Daniel DeLorme) wrote in #note-7:

What that really what you meant by "checking if Range#end is nil is good enough" ?

Yes. The original issue (#21513) was about the consistency between (1..).to_a and (1..).to_set. Since (1..1/0.0).to_a hangs, I think it's fine if (1..1/0.0).to_set also hangs.

Admittedly, I'm biased: I generally feel that proactively raising exceptions for endless range operations is unnecessary. If an operation hangs, let it hang.

Should we ask @matz (Yukihiro Matsumoto) to decide?

Updated by tenderlovemaking (Aaron Patterson) 14 days ago Actions #10 [ruby-core:123616]

mame (Yusuke Endoh) wrote in #note-9:

Dan0042 (Daniel DeLorme) wrote in #note-7:

What that really what you meant by "checking if Range#end is nil is good enough" ?

Yes. The original issue (#21513) was about the consistency between (1..).to_a and (1..).to_set. Since (1..1/0.0).to_a hangs, I think it's fine if (1..1/0.0).to_set also hangs.

Admittedly, I'm biased: I generally feel that proactively raising exceptions for endless range operations is unnecessary. If an operation hangs, let it hang.

I personally agree with this. If I write an infinite loop, I expect it to loop infinitely, even if I wrote the infinite loop by mistake. Calling to_a or to_set on an infinite range is iterating infinitely.

Should we ask @matz (Yukihiro Matsumoto) to decide?

If it's necessary. Is it necessary? 😅

Updated by jeremyevans0 (Jeremy Evans) 14 days ago Actions #11 [ruby-core:123617]

tenderlovemaking (Aaron Patterson) wrote in #note-10:

mame (Yusuke Endoh) wrote in #note-9:

Dan0042 (Daniel DeLorme) wrote in #note-7:

What that really what you meant by "checking if Range#end is nil is good enough" ?

Yes. The original issue (#21513) was about the consistency between (1..).to_a and (1..).to_set. Since (1..1/0.0).to_a hangs, I think it's fine if (1..1/0.0).to_set also hangs.

Admittedly, I'm biased: I generally feel that proactively raising exceptions for endless range operations is unnecessary. If an operation hangs, let it hang.

I personally agree with this. If I write an infinite loop, I expect it to loop infinitely, even if I wrote the infinite loop by mistake. Calling to_a or to_set on an infinite range is iterating infinitely.

I agree with @mame (Yusuke Endoh) and @tenderlovemaking (Aaron Patterson). Removing special handling of infinite ranges avoids the original issue (calling size), as well as avoiding the nil vs Infinity range end issue. I think we should just revert d4020dd5faf28486123853e7f00c36139fc07793.

Updated by Dan0042 (Daniel DeLorme) 13 days ago · Edited Actions #12 [ruby-core:123618]

If an operation hangs, let it hang.

I don't agree with that. If an operation crashes, yes, let it crash. But if it hangs, that's an order of magnitude or two harder to debug. Even more so if the "hang" is accompanied by unrestrained memory allocation, which can then bring out the OOM and/or crash the whole system. So if it's at all possible to raise early, then please let's do so.

Updated by mame (Yusuke Endoh) 13 days ago 1Actions #13 [ruby-core:123621]

I understand your point, but I personally disagree.

My concern is that the expectation to "if it's at all possible to raise early" is a slippery slope with no clear boundary.

For example, an operation like (1..1<<100).to_a will also effectively hang (or crash with an OOM) on any modern computer. This seems to fit the "unrestrained memory allocation" case you mentioned. Should we also preemptively raise an exception for this? If we do, what should the threshold be?

That said, I recall @matz (Yukihiro Matsumoto) expressing a similar concern to yours. I would like to hear his opinion on this matter.

Updated by tenderlovemaking (Aaron Patterson) 13 days ago Actions #14 [ruby-core:123624]

mame (Yusuke Endoh) wrote in #note-13:

I understand your point, but I personally disagree.

My concern is that the expectation to "if it's at all possible to raise early" is a slippery slope with no clear boundary.

For example, an operation like (1..1<<100).to_a will also effectively hang (or crash with an OOM) on any modern computer. This seems to fit the "unrestrained memory allocation" case you mentioned. Should we also preemptively raise an exception for this? If we do, what should the threshold be?

💯

That said, I recall @matz (Yukihiro Matsumoto) expressing a similar concern to yours. I would like to hear his opinion on this matter.

I'll add this to the dev meeting. I think we should just revert d4020dd5faf28486123853e7f00c36139fc07793

Updated by Dan0042 (Daniel DeLorme) 12 days ago · Edited Actions #15 [ruby-core:123629]

mame (Yusuke Endoh) wrote in #note-13:

My concern is that the expectation to "if it's at all possible to raise early" is a slippery slope with no clear boundary.

I don't think it's slippery at all. There's a very clear difference between "infinite" and "very large but finite". I didn't say "if it's at all possible to raise early" in any situation, this is specifically about infinity. If we were in a situation where it's hard to determine if something will run infinitely, then sure there's no helping it. But I don't think that's the case here.

tenderlovemaking (Aaron Patterson) wrote in #note-10:

If I write an infinite loop, I expect it to loop infinitely, even if I wrote the infinite loop by mistake. Calling to_a or to_set on an infinite range is iterating infinitely.

If I write something like loop{ ary << 42 } I also expect it to loop infinitely and consume all memory. Because I wrote that loop. But in the case of infinite_range.to_a I did not write any loop. The loop happens in ruby internally. I expect ruby to perform reasonable validation on its inputs. This should raise an exception for the same reason that 42 / 0 raises ZeroDivisionError and not segfault.

Updated by tenderlovemaking (Aaron Patterson) 9 days ago Actions #16 [ruby-core:123669]

In any case, I think we should remove the code from Set.new. The original ticket was about the behavior of Range#to_set vs Range#to_a, not about Set.new. At least Set.new's behavior should be restored.

Updated by knu (Akinori MUSHA) 2 days ago Actions #17 [ruby-core:123758]

I agree.

  • The implementation of Set.new can be naïve, that is, it shouldn't perform size checks or similar validations.
  • Enumerable#to_set can simply delegate everything to Set.new as it is now.
  • Adding Range#to_set and Enumerator#to_set overrides that perform size checks would help users avoid most common pitfalls.
Actions

Also available in: PDF Atom