Skip to content

DateTimeComponents.Format.parse() should parse timeZoneId() using the Temporal grammar instead of checking if the time zone exists in the time zone database #532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Jun 2, 2025

Conversation

DmitryNekrasov
Copy link
Contributor

@DmitryNekrasov DmitryNekrasov commented May 29, 2025

Summary

This PR transforms how DateTimeComponents.Format.parse() handles time zone parsing by implementing the Temporal specification's grammar-based approach, which removes the requirement that parsed time zones must exist in the system's time zone database and extends this same parsing philosophy to offset time zones.

The implementation follows the Temporal grammar for time zones, which defines syntactic rules for valid time zone identifiers:

time-zone-initial = ALPHA / "." / "_"
time-zone-char    = time-zone-initial / DIGIT / "-" / "+"
time-zone-part    = time-zone-initial *time-zone-char
time-zone-name    = time-zone-part *("/" time-zone-part)

Previously, when parsing strings containing time zone IDs using DateTimeComponents.Format.parse() with timeZoneId(), the parser validated that time zones existed in the system's database. Additionally, offset timezone validation used a separate finite state automaton that followed different parsing rules.

This PR unifies the parsing approach by replacing the previous offset timezone validation FSA with a comprehensive implementation that handles both offset formats and named time zones according to Temporal grammar specifications. The new finite state automaton processes input character by character, transitioning between states based on grammar productions for all valid time zone identifiers, whether they are named zones like "America/New_York" or offset formats like "+01:00".

all-in-one-automata

The unified parser accepts any syntactically valid time zone identifier according to the Temporal specification and defers actual time zone validation to the point of usage, such as when creating a TimeZone object. This consistent approach ensures that both named time zones and offset formats follow the same validation principles, improving code maintainability and specification compliance.

This change relaxes parsing constraints across all time zone formats, maintaining backward compatibility while providing more flexibility. Code that previously relied on parse-time validation of time zone existence will need to handle validation separately. Syntactically valid time zone IDs that were previously rejected will now parse successfully, with validation errors occurring later when attempting to use invalid time zones.

The test suite has been expanded to verify parsing of various time zone formats according to Temporal grammar, including both named zones and offset formats. Tests ensure that previously valid time zones continue to parse correctly while also covering edge cases and properly rejecting malformed identifiers. The unified finite state automaton implementation has been thoroughly tested against the complete grammar specification to ensure compliance across all supported time zone formats.

Example

// Before: Would fail if "Custom/TimeZone" doesn't exist in database
val parsed = format.parse("2024-01-01T12:00:00[Custom/TimeZone]")

// After: Parses successfully, validation happens when creating TimeZone
val parsed = format.parse("2024-01-01T12:00:00[Custom/TimeZone]")
val timeZone = TimeZone.of(parsed.timeZoneId!!)  // Validation happens here

…ation.

Extracted common timezone validation logic into an abstract `TimezoneParserOperation` class. `OffsetTimezoneParserOperation` now extends this class
Introduced NamedTimezoneParserOperation to handle named time zone validation within parsing structures. Refactored TimeZoneIdDirective to simplify its implementation by removing the knownZones property and incorporating the new parser operation for named time zones, enhancing flexibility and maintainability.
The `StringFieldFormatDirective` class was removed as it was no longer being used
Replaced `TODO` implementation in `NamedTimezoneParserOperation` with a state-based validation logic to handle complex timezone formats
Replaced `isDigit` with `isAsciiDigit` in timezone validation logic to ensure stricter control over accepted characters.
@DmitryNekrasov DmitryNekrasov self-assigned this May 29, 2025
@DmitryNekrasov DmitryNekrasov added the timezone The model and API of timezones label May 29, 2025
@DmitryNekrasov DmitryNekrasov marked this pull request as ready for review May 30, 2025 09:55
Copy link
Collaborator

@dkhalanskyjb dkhalanskyjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for kotlinx.datetime.format.DateTimeFormatBuilder.WithDateTimeComponents#timeZoneId is no longer valid with this. Please update it.

@dkhalanskyjb dkhalanskyjb self-requested a review June 2, 2025 11:00
@dkhalanskyjb dkhalanskyjb self-requested a review June 2, 2025 11:08
Copy link
Collaborator

@dkhalanskyjb dkhalanskyjb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! The documentation could be improved a bit, and I have a few stylistic suggestions, but other than that, the state machine looks correct and well implemented.

@DmitryNekrasov DmitryNekrasov merged commit 3e68613 into master Jun 2, 2025
1 check passed
@DmitryNekrasov DmitryNekrasov deleted the dmitry.nekrasov/feature/531 branch June 2, 2025 14:22
dkhalanskyjb pushed a commit that referenced this pull request Jun 11, 2025
Replace existing validation with a unified finite state automaton that implements RFC 9557 grammar for all time zone identifiers (both named zones and offsets). Parse-time validation is removed; time zones are now validated only when used.

Syntactically valid but non-existent time zones will now parse successfully. Validation errors occur when creating TimeZone objects, not during parsing.

Fixes #531
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
timezone The model and API of timezones
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants