The recommended size of source audience is between 5,000 and 50,000. Too few customers will not likely provide sufficient data to estimate probability distribution for each feature. Too many customers will result in distributions that are generally similar to the population distribution for most features. Note that the recommended source audience size is the effective source audience size, not the input size: it is not helpful to randomly select 5,000 customers out of a large segment of 500,000 customers. The effective audience siz would be 500,000 in this case, which is too general (~4.5% of all Coupang customers).
0 commit comments