-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
feat(python): Allow %f
in strptime
format strings
#8404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hey @stinodego it's because it differs from the Python stdlib and so is a recipe for unexpected behaviour e.g. compare: In [1]: pl.Series(['2020-01-01T00:00:00.12']).str.strptime(pl.Datetime, '%Y-%m-%dT%H:%M:%S.%f')
Out[1]:
shape: (1,)
Series: '' [datetime[ns]]
[
2020-01-01 00:00:00.000000012
]
In [2]: datetime.strptime('2020-01-01T00:00:00.12', '%Y-%m-%dT%H:%M:%S.%f')
Out[2]: datetime.datetime(2020, 1, 1, 0, 0, 0, 120000) whereas the expected behaviour was almost certainly In [2]: datetime.strptime('2020-01-01T00:00:00.12', '%Y-%m-%dT%H:%M:%S.%f')
Out[2]: datetime.datetime(2020, 1, 1, 0, 0, 0, 120000)
In [3]: pl.Series(['2020-01-01T00:00:00.12']).str.strptime(pl.Datetime, '%Y-%m-%dT%H:%M:%S%.f')
Out[3]:
shape: (1,)
Series: '' [datetime[ns]]
[
2020-01-01 00:00:00.120
] |
I see. The chrono crate does very clearly document this behaviour, however. So it feels strange to disable this functionality, which might be useful to some users, purely because some other users do not read the docs and may get unexpected behavior. We clearly link to the chrono docs, we may even add a warning that It also introduced differences between Python Polars and Rust Polars. Do you have an opinion here, @ritchie46 ? |
You're correct that it's documented. I'll give a pandas experience example if I may: the behaviour in this issue had been clearly documented for at least a decade, as had the supported workaround (specify There's already been one issue on github and one post on Discord about unexpected behaviour due to this, it's already biting people I don't think anyone reads "10.5 seconds" and thinks "ah, that's 10 seconds and 5 nanoseconds", but that's exactly what In [15]: pl.Series(['2020-01-01T00:00:10.5']).str.strptime(pl.Datetime, '%Y-%m-%dT%H:%M:%S.%f')
Out[15]:
shape: (1,)
Series: '' [datetime[ns]]
[
2020-01-01 00:00:10.000000005
] That's for questioning the decision anyway, it's good to talk it through! |
Here's some example data where I would like to use s = pl.Series(["05:10:10+ns12345", "05:10:10+ns1234567"])
result = s.str.strptime(pl.Time, "%H:%M:%S+ns%f") Resulting in:
The other options So now we've disabled some perfectly good functionality, that the The chrono formats are actually very clear and well thought out. In the formats that you present as confusing, it's seconds followed by a period followed by some numbers. In other words, decimals. Then you should use So in this case, I'd vouch for educating Python users in the (perhaps superior) chrono format, rather than disabling perfectly good functionality. That does leave open the question of how to teach this properly, which is not easy as you have stated... |
that's a good example, thanks! maybe it's just |
That would be better than the existing check. However, even in this case, you're making assumptions for the user. Maybe they really intended to use Maybe a compromise would be to throw a warning in this case. We can define a custom Python warning Does that seem like a good solution to you? |
Yup, nice one! Thanks for the discussion, that sounds like the best solution to me |
Great! I'll update the PR accordingly 👍 thanks for the insights! |
%f
in strptime
format strings%f
in strptime
format strings
e23276c
to
2aa84b0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this, it's a better solution! Over to Ritchie
Thanks for the discourse/breakdown. Interesting read. And we have a better solution now! 💯 Great stuff. 👍 |
I looked into this and I could not find a reason why this would not be permitted. I updated the tests and it all works as expected.
Pinging @MarcoGorelli as I expect he introduced this check - could you elaborate?
I also removed a check on the
dtype
input - this is already handled by theif/elif/else
statement.