Skip to content

GitPython parses RFC 2822 dates with timezone incorrectly #1247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
yanniszark opened this issue May 20, 2021 · 6 comments
Open

GitPython parses RFC 2822 dates with timezone incorrectly #1247

yanniszark opened this issue May 20, 2021 · 6 comments

Comments

@yanniszark
Copy link

Bug Report

Description

The parse_date function returns a wrong result, when using an RFC 2822 date with timezone.

In [24]: parse_date('Thu, 20 May 2021 13:38:44 +0300')
Out[24]: (1621517924, -10800)

This output is wrong.
1621517924 is supposed to be UTC time (10:38:44).
But instead, it is my time (13:38:44)

Source Code Investigation

The function calls are as following:

  1. repo.index.commit takes author_date as input, which is an RFC 2822 string.
  2. repo.index.commit calls Commit.create_from_tree with author_date.
  3. Commit.create_from_tree calls parse_date(author_date) to parse the date into author_time and author_offset.
  4. Commit.create_from_treecalls cls (aka Commit.__init__) passing author_time and author_offset.
  5. Commit.__init__ saves author_time and author_offset in authored_date and author_tz_offset.

Thus, the fault should lie with the parse_date function.

Environment

Tried with versions 2.1.11 and 3.1.14.

@Byron
Copy link
Member

Byron commented May 20, 2021

Thanks for the report!
With the given example, I could reproduce the issue and validate that GitPython essentially does the opposite. It returns the original time and inverts the offset, so that original_time + offset yields the UTC time.

The respective source is here, and I think it's very old too. I am also very sure that a lot of code depends on this behaviour so fixing it to match the docs will be a breaking change.

Instead of that I propose to change the documentation instead. What do you think?

@yanniszark
Copy link
Author

Hi @Byron,

Thanks a lot for the quick response!

It returns the original time and inverts the offset, so that original_time + offset yields the UTC time.

I am using GitPython to make a commit and specify a date in RFC 2822 (date taken from another commit).
The original date (as shown in git log) is: Fri May 21 12:45:30 2021 +0300
The resulting date (as shown in git log) is: Fri May 21 15:45:30 2021 +0300

Thus, when creating a commit with an RFC 2822 date, the resulting date is incorrect, it's not that the timezone has changed to balance the changed time.

The respective source is here, and I think it's very old too. I am also very sure that a lot of code depends on this behaviour so fixing it to match the docs will be a breaking change.

I'm not sure about the utctz_to_altz function and where it is used.
However, I think we should fix parse_date to correctly parse RFC 2822 dates, because that is clearly broken.
We can do that without changing the helper function utctz_to_altz, if you believe it will break other places.

@yanniszark
Copy link
Author

To make my case more clear, here is a minimum reproducer:

import os
import git
import tempfile

from pathlib import Path
from email import utils as email_utils

tmp_dir = tempfile.mkdtemp()
print("Repo folder: '%s'" % tmp_dir)
repo = git.Repo.init(tmp_dir)
Path(os.path.join(repo.working_dir, "example_file")).touch()
repo.index.add(["example_file"])
author = git.Actor("user", "[email protected]")
date = "Thu, 20 May 2021 13:38:44 +0300"
print("Input date: '%s'" % date)
repo.index.commit("Wrong date", author=author, committer=author,
                  author_date=date, commit_date=date)
print("Output date: '%s'" %
      email_utils.format_datetime(repo.head.commit.authored_datetime))

Which outputs:

Repo folder: '/tmp/tmp2x6_v43g'
Input date: 'Thu, 20 May 2021 13:38:44 +0300'
Output date: 'Thu, 20 May 2021 16:38:44 +0300'

@Byron
Copy link
Member

Byron commented May 22, 2021

Thanks for making the issue easily reproducible. I have taken a look and noticed that the date in the commit arrives correctly, that's a pleasant surprise, and it can be expected that reading the same date both from the newly created commit (by re-reading it or by using the return value of commit(…)).

My initial desire to not fix such a long-standing bug in fear of breakage makes the assumption that a lot of code depends on it. A way to find it would be to fix this issue and see how many new issues show up - in the worst case the release can be yanked to find another way of introducing a fix.

Thanks a lot for your engagement here to help making this change, I do hope you would be willing to help fixing it too. Please note that if you do and no existing test breaks, it's desirable to add a new test for that.

@AA-Turner
Copy link

Recording for posterity: This is also broken for timezones with half-hour offsets (such as Indian Standard Time):

>>> from git.objects.util import utctz_to_altz
>>> utctz_to_altz('+0100') / -3600
1.0
>>> utctz_to_altz('+0530') / -3600
5.3

@Byron
Copy link
Member

Byron commented Apr 19, 2025

Here is a fix as produced by an LLM:

def utctz_to_altz(utctz: str) -> int:
    """Convert a git timezone offset into a timezone offset west of UTC in seconds
    (compatible with :attr:`time.altzone`).

    :param utctz:
        git utc timezone string, e.g. +0200
    """
    sign = 1 if utctz[0] == '+' else -1
    hours = int(utctz[1:3])
    minutes = int(utctz[3:5])

    seconds = (hours * 3600) + (minutes * 60)
    return sign * seconds

However, one would probably also want to add some tests to be sure it handles varying inputs gracefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants