-
Notifications
You must be signed in to change notification settings - Fork 468
Removes unnecessary shuffle from randomSample method of Collection and Sequence #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
randomSample method already swap selected element with a randomly chosen one in the reservoir.
randomSample method already swap selected element with a randomly chosen one in the reservoir.
|
Whether the shuffle is unnecessary or not is a mathematical question. I'm not sure I understand how you arrived at the conclusion that "there is no need to do that" based on your description here, although perhaps it is straightforward. Can you show a proof as to why each element of the original element is equally likely to found at any particular position of the result even without this shuffle? |
|
Edit: Disregard, the guide goes into this. The shuffle is necessary. The elements in let j = Int.random(in: 0..<result.count, using: &rng)but the first |
|
Thanks @timvermeulen for clarifying. i guess it's important now to look at what reservoir sampling algorithm says, it works pretty much in two steps:
According to reservoir sampling algorithm each number is selected with the probability of Example:
That said, giving each element equal probability of being selected i think shuffling the reservoir is unnecessary step perhaps a side effect of |
This doesn't address the problem that is at discussion here, which is that not all the numbers have equal probability of being picked first. See the README for more details. |
|
Ah, I clearly misunderstood the purpose of the TODO. Whether we need to shuffle is not a mathematical question, it's a question of whether or not we want the order of the sample to be random as well. |
|
Right — this is about the surprise factor of the result sometimes having elements in their initial order, not the likelihood or weighting of particular elements in the result. While random sampling doesn't necessarily have to guarantee that the randomly selected elements will also be in a random order, I'm concerned that it will be surprising for the user to have that not be the case. The related guide covers the topic in the Post Sample Shuffling section. Since this is an intentional design decision, I'm going to close this for now. Feel free to open a discussion in the forums if you'd like to talk about this more. |
This PR removes unnecessary shuffle from
randomSamplemethod of Collection and Sequence.Since randomSample method already Swap selected element with a randomly chosen one in the reservoir there is no need to do that extra shuffle for the reservoir. This was marked as
// FIXMEby removing it,randomSamplestill randomly selects the specified number of elements from this sequence, and theRandomSampleTestsruns successfully.Checklist