[Request] Clarify token pruning docs #481

kderusso · 2025-02-18T14:17:37Z

See Slack for more context.

Right now we say:

tokens_weight_threshold: Tokens whose weight is less than tokens_weight_threshold are considered insignificant and pruned. This value must be between 0 and 1. Default: 0.4.

This is misleading.

By setting the tokens_freq_ratio_threshold to 10, you are saying that in order to be pruned, a document must be 10x more frequent than the average token across all tokens in all documents for that field. This is higher than the default of 5, so you’re dialing this back and requiring tokens to be even more frequent in order to be pruned. In practice, I would expect this would prune only extremely common tokens - think common words like is and the for example.

By setting the tokens_weight_threshold to 0.4, you are saying that you want to take the best scoring token, and never prune anything that’s more than 40% of that score. Because scores can vary so widely in any given text search results, we can’t issue a blanket “this is the minimum score” and still expect to have consistently good results. Instead, let’s say your top score was 0.2. That means that in order to be pruned, a token’s score would have to be below 0.08.
Both of those criteria must match for a token to be pruned.

The text was updated successfully, but these errors were encountered:

bmorelli25 added needs-team Issues pending triage by the Docs Team Team:Platform Issues owned by the Platform Docs Team labels Apr 17, 2025

github-actions bot removed the needs-team Issues pending triage by the Docs Team label Apr 17, 2025

georgewallace added Team:Search Issues owned by the Search Docs Team and removed Team:Platform Issues owned by the Platform Docs Team labels Apr 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Request] Clarify token pruning docs #481

[Request] Clarify token pruning docs #481

kderusso commented Feb 18, 2025

[Request] Clarify token pruning docs #481

[Request] Clarify token pruning docs #481

Comments

kderusso commented Feb 18, 2025