Closed
Description
Minimal version of patterned text mapping. Using simple parsing to decompose log message into template, timestamp, and arguments doc values. Use a single inverted index built from the original message. Likely only support boolean term queries, similar to match_only_text
. Potentially extend to inefficient phrase queries using the reconstructed message.
This will largely follow the work in the previous pattern text prototype: #124323
Behavior
- input text will be split on a hard-coded list of delimiters
- tokens which contain a digit will be arguments, otherwise will be part of template
- template tokens will be joined with a placeholder which stands in for the extracted args
- template with placeholder will be stored in a SortedDocValue
- args will be stored in a single SortedSetDocValue
- Docs are sorted by template, then by arguments
- Original message text is indexed, likely without positions.
Features in prototype which will be missing from MVP:
- IP/UUID parsing
- separate timestamp handling
- optimized arguments
- template access through separate field