Skip to content

Decompose patterned text message into doc values #128932

Closed
@parkertimmins

Description

@parkertimmins

Minimal version of patterned text mapping. Using simple parsing to decompose log message into template, timestamp, and arguments doc values. Use a single inverted index built from the original message. Likely only support boolean term queries, similar to match_only_text. Potentially extend to inefficient phrase queries using the reconstructed message.

This will largely follow the work in the previous pattern text prototype: #124323

Behavior

  • input text will be split on a hard-coded list of delimiters
  • tokens which contain a digit will be arguments, otherwise will be part of template
  • template tokens will be joined with a placeholder which stands in for the extracted args
  • template with placeholder will be stored in a SortedDocValue
  • args will be stored in a single SortedSetDocValue
  • Docs are sorted by template, then by arguments
  • Original message text is indexed, likely without positions.

Features in prototype which will be missing from MVP:

  • IP/UUID parsing
  • separate timestamp handling
  • optimized arguments
  • template access through separate field

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions