-
Notifications
You must be signed in to change notification settings - Fork 115
DateTimeComponents.Format.parse() should parse timeZoneId() using the Temporal grammar instead of checking if the time zone exists in the time zone database #532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ation. Extracted common timezone validation logic into an abstract `TimezoneParserOperation` class. `OffsetTimezoneParserOperation` now extends this class
Introduced NamedTimezoneParserOperation to handle named time zone validation within parsing structures. Refactored TimeZoneIdDirective to simplify its implementation by removing the knownZones property and incorporating the new parser operation for named time zones, enhancing flexibility and maintainability.
The `StringFieldFormatDirective` class was removed as it was no longer being used
Replaced `TODO` implementation in `NamedTimezoneParserOperation` with a state-based validation logic to handle complex timezone formats
Replaced `isDigit` with `isAsciiDigit` in timezone validation logic to ensure stricter control over accepted characters.
The removed test case was unnecessary as it tested an invalid time zone scenario that the core parsing logic already handles elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation for kotlinx.datetime.format.DateTimeFormatBuilder.WithDateTimeComponents#timeZoneId
is no longer valid with this. Please update it.
DateTimeComponents.Format { timeZoneId(); char('/') }.parse("$zoneId/")
…atBuilder documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job! The documentation could be improved a bit, and I have a few stylistic suggestions, but other than that, the state machine looks correct and well implemented.
….onFalse` for cleaner and more concise logic.
… and consistency.
…e parsing tests
…undant bracket parsing test
Replace existing validation with a unified finite state automaton that implements RFC 9557 grammar for all time zone identifiers (both named zones and offsets). Parse-time validation is removed; time zones are now validated only when used. Syntactically valid but non-existent time zones will now parse successfully. Validation errors occur when creating TimeZone objects, not during parsing. Fixes #531
Summary
This PR transforms how
DateTimeComponents.Format.parse()
handles time zone parsing by implementing the Temporal specification's grammar-based approach, which removes the requirement that parsed time zones must exist in the system's time zone database and extends this same parsing philosophy to offset time zones.The implementation follows the Temporal grammar for time zones, which defines syntactic rules for valid time zone identifiers:
Previously, when parsing strings containing time zone IDs using
DateTimeComponents.Format.parse()
withtimeZoneId()
, the parser validated that time zones existed in the system's database. Additionally, offset timezone validation used a separate finite state automaton that followed different parsing rules.This PR unifies the parsing approach by replacing the previous offset timezone validation FSA with a comprehensive implementation that handles both offset formats and named time zones according to Temporal grammar specifications. The new finite state automaton processes input character by character, transitioning between states based on grammar productions for all valid time zone identifiers, whether they are named zones like "America/New_York" or offset formats like "+01:00".
The unified parser accepts any syntactically valid time zone identifier according to the Temporal specification and defers actual time zone validation to the point of usage, such as when creating a
TimeZone
object. This consistent approach ensures that both named time zones and offset formats follow the same validation principles, improving code maintainability and specification compliance.This change relaxes parsing constraints across all time zone formats, maintaining backward compatibility while providing more flexibility. Code that previously relied on parse-time validation of time zone existence will need to handle validation separately. Syntactically valid time zone IDs that were previously rejected will now parse successfully, with validation errors occurring later when attempting to use invalid time zones.
The test suite has been expanded to verify parsing of various time zone formats according to Temporal grammar, including both named zones and offset formats. Tests ensure that previously valid time zones continue to parse correctly while also covering edge cases and properly rejecting malformed identifiers. The unified finite state automaton implementation has been thoroughly tested against the complete grammar specification to ensure compliance across all supported time zone formats.
Example