-
-
Notifications
You must be signed in to change notification settings - Fork 980
feat(number): add distributor functions #3375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: next
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for fakerjs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## next #3375 +/- ##
==========================================
- Coverage 99.97% 99.97% -0.01%
==========================================
Files 2811 2814 +3
Lines 217414 217443 +29
Branches 949 959 +10
==========================================
+ Hits 217359 217387 +28
- Misses 55 56 +1
🚀 New features to boost your workflow:
|
I'm also not sure whether |
I don't like calling it "exponentialDistribution " because you're not returning a distribution, you are returning a single value from that distribution. So I'd call it exponentialValue or floatExponential or something similar to emphasise the thing that is returned is a float or a value. |
I'm not even sure that is the case.
Good idea. I'm not finally sure which term I would like to use but stepping away from calling it I'm currently considering one of these:
|
Team decision The feature will be implemented by extending the already existing methods ( We are currently unsure how to name the parameter. While distribution is okay, we are not sure if this is the mathmatical correct term. |
Ready for review. I changed the implementations to use a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The more I'm looking at this PR, the more I'm falling in love with this feature's design. ❤️
I found some small things that I was noticing, while trying to get back into this topic. @ST-DDT are you still available or can I take over the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To not couple the number tests with the implementation details of our given distributor functions, we should probably use custom, "deterministic" ones in theses test cases. Additionally, we should add separate test cases for the provided distributor functions. For example: The exponential distributor is currently not being checked for a negative "base" input.
@@ -49,6 +58,18 @@ function comparableSanitizedHtml(html: string): string { | |||
.replaceAll(' ', ''); | |||
} | |||
|
|||
/** | |||
* Wraps the given code in a code block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Wraps the given code in a code block. | |
* Wraps the given code in a markdown code block. |
* const alwaysMin: Distributor = () => 0; | ||
* const uniform: Distributor = (randomizer: Randomizer) => randomizer.next(); | ||
* | ||
* faker.number.int({ min: 2, max: 10, distributor: alwaysMin }); // 2 | ||
* faker.number.int({ min: 0, max: 10, distributor: uniform }); // 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exemplary calls should probably be repeated to better visualize the results:
* const alwaysMin: Distributor = () => 0; | |
* const uniform: Distributor = (randomizer: Randomizer) => randomizer.next(); | |
* | |
* faker.number.int({ min: 2, max: 10, distributor: alwaysMin }); // 2 | |
* faker.number.int({ min: 0, max: 10, distributor: uniform }); // 5 | |
* const alwaysMin: Distributor = () => 0; | |
* faker.number.int({ min: 2, max: 10, distributor: alwaysMin }); // 2 | |
* faker.number.int({ min: 2, max: 10, distributor: alwaysMin }); // 2 | |
* faker.number.int({ min: 2, max: 10, distributor: alwaysMin }); // 2 | |
* const uniform: Distributor = (randomizer: Randomizer) => randomizer.next(); | |
* faker.number.int({ min: 0, max: 10, distributor: uniform }); // 5 | |
* faker.number.int({ min: 0, max: 10, distributor: uniform }); // 2 | |
* faker.number.int({ min: 0, max: 10, distributor: uniform }); // 9 |
* A function that determines the distribution of generated values. | ||
* Values generated by a randomizer are considered uniformly distributed, distributor functions can be used to change this. | ||
* If many results are collected the results form a limited distribution between `0` and `1`. | ||
* So an exponential distributor will values resemble a limited exponential distribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* So an exponential distributor will values resemble a limited exponential distribution. | |
* So an exponential distributor's values will resemble a limited exponential distribution. |
import { uniformDistributor } from './uniform'; | ||
|
||
/** | ||
* Creates a new function that generates power-law/exponentially distributed values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add a more accessible explanation what this means as well.
* Creates a new function that generates power-law/exponentially distributed values. | |
* Creates a new function that generates exponentially distributed values. | |
* This means that values on the edges (min or max based on configuration) are more likely to be generated. |
This needs to be put in the second signature as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT this is a power law distribution and not a exponential distribution in the mathematical sense.
This means that values on one side of the of the value range are far more likely than the values on the other side of the range (depending on configuration).
Or something similar.
export function exponentialDistributor( | ||
options?: | ||
| { | ||
/** | ||
* The base of the exponential distribution. Should be greater than 0. | ||
* The higher/more above `1` the `base`, the more likely the number will be closer to the minimum value. | ||
* The lower/closer to zero the `base`, the more likely the number will be closer to the maximum value. | ||
* Values of `1` will generate a uniform distribution. | ||
* Can alternatively be configured using the `bias` option. | ||
* | ||
* @default 2 | ||
*/ | ||
base?: number; | ||
} | ||
| { | ||
/** | ||
* An alternative way to specify the `base`. Also accepts values below zero. | ||
* The higher/more positive the `bias`, the more likely the number will be closer to the maximum value. | ||
* The lower/more negative the `bias`, the more likely the number will be closer to the minimum value. | ||
* Values of `0` will generate a uniform distribution. | ||
* Can alternatively be configured using the `base` option. | ||
* | ||
* @default -1 | ||
*/ | ||
bias?: number; | ||
} | ||
): Distributor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a reason why we didn't made these two dedicated functions?
The advantages would be that it should be more obvious to the end user that you can not pass bias AND base at the same time. Technically this is already the way, but the signatures could more detailed examples based on the input that is possible. The signature containing the bias argument could only contain table with the bias values. Same with the base.
The disadvantage would be that we would still require a third combined signature (same as we have right now) for our documentation generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting the options into two separate signatures is fine.
Maybe we did this for discoverability of both options or maybe this wasn't finished in the first place.
Feel free to take over. |
What is the point of the uniformDistributor? Isn't that the same as not passing a distributor at all? |
Should we provide a sample normal distribution function (it wouldn't be a true normal distribution because it's bounded at the min and max). But I think it would be a fairly common use case to want something like "generate me heights in centimetres, with a mean of 170cm and standard deviation of 7cm." So I think we could provide a "recipe" for that kind of distribution even if it's not strictly normal. |
To allow for using a non-nullish distributor value. |
That's fine maybe we could just clarify that in the documentation. Otherwise a developer may set the option and wonder why "nothing happens". |
EDIT/fix: |
You might want to start without it first.
|
I am a mathematician 😀 I will have a go at something that would meet expectations. |
Implements #1862
Adds a
distributor
option to faker.number.int and float. The distributor function can be used to influence the distribution of the generated values. E.g. uniformDistributor and exponentialDistributor.The following table shows the rough distribution of values generated using
exponentialDistributor({ base: x })
:The exponential distribution is achieved by using
base ** exponent
where exponent is a random float between 0 and 1.The result is then stretched evenly to fit to
min
andmax
.