New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse unix timestamps consistently regardless of timezones #954
Conversation
Otherwise datetime.fromtimestamp will return a naive datetime in the local timezone and applying timezone modifications later with TIMEZONE and TO_TIMEZONE won't do the right thing. Parsing a unix timestamp value should always result in the exact same instant in time regardless of current time zone, so it shouldn't matter what the current TIMEZONE setting is, whether 'UTC', 'local', any other timezone, or even unset.
Codecov Report
@@ Coverage Diff @@
## master #954 +/- ##
=======================================
Coverage 98.29% 98.29%
=======================================
Files 234 234
Lines 2702 2705 +3
=======================================
+ Hits 2656 2659 +3
Misses 46 46
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
By the way, when I was adding tests, I tried to follow the pattern of the other tests in
But, when I do that, it seems like I end up modifying the global |
bc5ae4d
to
3e8b347
Compare
@Gallaecio Is this good to merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
TBH, I am not able to understand the issue here. Maybe an example may help. But if we are applying the timezone and stripping it off, to eventually apply the timezone from settings through Also the statement in the initial explanation
Where do we see that behavior, since the |
The examples in the added tests should show the issue. You could try running those tests on the main line code without my changes.
The problem is that there's not just one time zone that's being stripped and reapplied. The date is initially constructed with the time zone of the system (regardless of any dateparser settings), but then that time zone is not kept on the datetime. Then later in the code, the datetime is interpreted as if it was in dateparser's settings.TIMEZONE. If the system's time zone happens to be the same as the one used for dateparser's settings, then I think everything is fine. But if my system is set to 'America/New_York', and I set the dateparser settings time zone to 'UTC' and ask it to parse '1661996156', I believe dateparser will currently give me 'August 31, 2022 9:35:56 PM UTC' when it should give me 'September 1, 2022 1:35:56 AM UTC'. Sorry if there are any errors in the above comment, I'm on mobile and it's also been a while since I first made this branch. Please test it the examples and see if what I'm saying makes sense. |
Yes, but with the new code, the naive datetime object is created in the same time zone as settings.TIMEZONE. so that later, when it's interpreted as if it was in settings.TIMEZONE, that's now a correct assumption. With the old code, the naive datetime is created relative to the system's time zone. |
Here's an example. With the current code:
The returned unix timestamp doesn't match the original timestamp. And with my branch:
It does match. |
Thanks @onlynone! It's clear now. I understand where the difference comes from. As far as I understood, previous to, when the date goes to the |
@gutsytechster @Gallaecio I think everything has been resolved, can it be merged? |
Thank you! |
Using an old version of |
When
dateparser
parses a unix timestamp, and the user has set aTIMEZONE
that is notlocal
(or the same as the user's system local timezone), it gets the wrong time. This is becausedatetime.fromtimestamp(timestamp)
will return adatetime
in the user's local timezone as configured by their system (influenced by theTZ
env var,/etc/localtime
, etc). But, it will be a naivedatetime
without any timezone information . Thedateparser
code will try to interpret thatdatetime
object as if it was in thesettings.TIMEZONE
timezone, which is incorrect.The fix is to use
datetime.fromtimestamp(timestamp, timezone_that_matches_settings_TIMEZONE)
to get a timezone awaredatetime
object with a timezone that matches dateparser'ssettings.TIMEZONE
(or use the local timezone if that is unset). Then strip thetzinfo
from thatdatetime
object, becauseapply_timezone_from_settings(date_obj, settings)
requiresdate_obj
to be a naivedatetime
without timezone information. So whenapply_timezone_from_settings
interprets that object, it does so correctly.