You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tokens_weight_threshold: Tokens whose weight is less than tokens_weight_threshold are considered insignificant and pruned. This value must be between 0 and 1. Default: 0.4.
This is misleading.
By setting the tokens_freq_ratio_threshold to 10, you are saying that in order to be pruned, a document must be 10x more frequent than the average token across all tokens in all documents for that field. This is higher than the default of 5, so you’re dialing this back and requiring tokens to be even more frequent in order to be pruned. In practice, I would expect this would prune only extremely common tokens - think common words like is and the for example.
By setting the tokens_weight_threshold to 0.4, you are saying that you want to take the best scoring token, and never prune anything that’s more than 40% of that score. Because scores can vary so widely in any given text search results, we can’t issue a blanket “this is the minimum score” and still expect to have consistently good results. Instead, let’s say your top score was 0.2. That means that in order to be pruned, a token’s score would have to be below 0.08.
Both of those criteria must match for a token to be pruned.
The text was updated successfully, but these errors were encountered:
See Slack for more context.
Right now we say:
This is misleading.
By setting the tokens_freq_ratio_threshold to 10, you are saying that in order to be pruned, a document must be 10x more frequent than the average token across all tokens in all documents for that field. This is higher than the default of 5, so you’re dialing this back and requiring tokens to be even more frequent in order to be pruned. In practice, I would expect this would prune only extremely common tokens - think common words like is and the for example.
By setting the tokens_weight_threshold to 0.4, you are saying that you want to take the best scoring token, and never prune anything that’s more than 40% of that score. Because scores can vary so widely in any given text search results, we can’t issue a blanket “this is the minimum score” and still expect to have consistently good results. Instead, let’s say your top score was 0.2. That means that in order to be pruned, a token’s score would have to be below 0.08.
Both of those criteria must match for a token to be pruned.
The text was updated successfully, but these errors were encountered: