-
Notifications
You must be signed in to change notification settings - Fork 115
DateTimeComponents.Format parse throws an exception when timeZoneId() is UTC #528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@dkhalanskyjb Dmitry, hello! Could you please look at the added tests, do they correctly document the expected behavior and fully cover possible scenarios? Thank you! |
…eId) } to SHOULD_PARSE_INCORRECTLY case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should do the trick, I only have minor suggestions. Please squash when merging and mention Fixes #444
in the full commit message.
When squash-merging, we also leave the |
TimeZone parser implementation using Finite State Automaton
This PR introduces a comprehensive parser for timezone identifiers built around a finite state automaton that processes input character by character through well-defined state transitions.
The parser recognizes several categories of timezone formats. Named timezone identifiers (UTC, GMT, UT, Z, z). Fixed offset formats support multiple notation styles ranging from compact single-digit representations (+1, -5) to fully specified formats with colons (+01:02:03, -05:30:15). Combined formats could merge named prefixes with offsets (UTC+01:00, GMT-05:30:45).
The finite state automaton architecture employs six distinct parsing states that guide the recognition process. The parser begins in the START state, recognizing either immediate completion tokens (Z/z), sign indicators leading to AFTER_SIGN, or timezone prefixes transitioning to AFTER_PREFIX. From AFTER_PREFIX, only sign indicators are valid, advancing to AFTER_SIGN. The AFTER_SIGN state processes either two-digit hour components leading to AFTER_HOUR or single-digit shortcuts that complete parsing immediately. The AFTER_HOUR state handles either colon-separated minute components transitioning to AFTER_COLON_MINUTE or direct two-digit minute parsing advancing to AFTER_MINUTE. Both AFTER_MINUTE and AFTER_COLON_MINUTE states can process additional time components, with AFTER_COLON_MINUTE specifically handling colon-prefixed seconds before reaching the END state.
A significant architectural decision involves the removal of temporal validation constraints, allowing the parser to accept mathematically invalid but syntactically correct formats such as +99:99:99. This design choice prioritizes format recognition over semantic validation, enabling downstream components to handle temporal logic validation as needed.
Comprehensive test coverage validates the parser's behavior across diverse scenarios, including edge cases involving malformed inputs, boundary conditions, and format variations. The tests confirm correct acceptance of valid timezone identifiers while appropriately rejecting invalid formats.