Skip to content

Conversation

@BinaryMuse
Copy link

Overview

This PR introduces a new API method Parser::parse_sql_with_offsets() that returns parsed statements along with byte offsets into the original source string.

Motivation

I'm using Parser::parse_sql to parse an arbitrary number of statements. Based on the type of statement, I need to handle execution differently. However, the canonical representation of the Statement includes uppercase type names (in most cases), which don't work as a query for ClickHouse since ClickHouse uses case-sensitive type names: for example, Nullable(Float64) vs Nullable(FLOAT64).

parse_sql_with_offsets returns Vec<(Statement, SourceOffset)>, where SourceOffset::start() and ::end() return the byte offsets of the statement from the original query, allowing me to recover the original source of the statement in question:

let result = Parser::parse_sql_with_offsets(&dialect, &sql).unwrap();
for (statement, offset) in result {
  let original_statement_sql = sql[offset.range()]
}

Alternatives

This seems like it would only be useful while work on #1548 is not yet complete, so it's totally reasonable if you'd prefer this PR not to be merged.

Implementation details

  • Add SourceOffset type to track byte positions in source text
  • Add Parser::parse_sql_with_offsets() public API method
  • Add Parser::parse_statements_with_offsets() internal method
  • Add helper function to convert line/column to byte offsets
  • Add comprehensive tests covering single/multiple statements and multiline SQL

Introduces a new API method `Parser::parse_sql_with_offsets()` that
returns parsed statements along with byte offsets into the original
source string. This allows users to recover the exact original text
for each statement, which is useful for preserving case-sensitive
identifiers and type names that may be normalized in the AST.

- Add `SourceOffset` type to track byte positions in source text
- Add `Parser::parse_sql_with_offsets()` public API method
- Add `Parser::parse_statements_with_offsets()` internal method
- Add helper function to convert line/column to byte offsets
- Add comprehensive tests covering single/multiple statements,
  case-sensitive type names, and multiline SQL

This is particularly useful for dialects like ClickHouse where type
names are case-sensitive (e.g., `Nullable(Float64)` vs `Nullable(FLOAT64)`).
@BinaryMuse BinaryMuse force-pushed the mkt/parse-sql-with-byte-offsets branch from a075c25 to 0d47a11 Compare November 7, 2025 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant