Skip to content

isoformat() / fromisoformat() for datetime.timedelta #86260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ErikCederstrand mannequin opened this issue Oct 20, 2020 · 23 comments
Open

isoformat() / fromisoformat() for datetime.timedelta #86260

ErikCederstrand mannequin opened this issue Oct 20, 2020 · 23 comments
Labels
extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@ErikCederstrand
Copy link
Mannequin

ErikCederstrand mannequin commented Oct 20, 2020

BPO 42094
Nosy @vadmium, @pganssle

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2020-10-20.07:13:53.625>
labels = ['library', '3.10']
title = 'isoformat() / fromisoformat() for datetime.timedelta'
updated_at = <Date 2020-11-18.06:00:30.213>
user = 'https://bugs.python.org/ErikCederstrand'

bugs.python.org fields:

activity = <Date 2020-11-18.06:00:30.213>
actor = 'Erik Cederstrand'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['Library (Lib)']
creation = <Date 2020-10-20.07:13:53.625>
creator = 'Erik Cederstrand'
dependencies = []
files = []
hgrepos = []
issue_num = 42094
keywords = []
message_count = 5.0
messages = ['379091', '379096', '379097', '381273', '381314']
nosy_count = 3.0
nosy_names = ['martin.panter', 'Erik Cederstrand', 'p-ganssle']
pr_nums = []
priority = 'normal'
resolution = None
stage = None
status = 'open'
superseder = None
type = None
url = 'https://bugs.python.org/issue42094'
versions = ['Python 3.10']

@ErikCederstrand
Copy link
Mannequin Author

ErikCederstrand mannequin commented Oct 20, 2020

Python 3.7 gained support for parsing ISO 8601 formatted time, date and datetime strings via the fromisoformat() methods. Python has seen improved support for ISO 8601 in general; ISO calendar format codes were added in Python 3.6, and fromisocalendar() was added in Python 3.8.

ISO 8601 also has a standard for durations: https://en.wikipedia.org/wiki/ISO_8601#Durations

For consistency with the other objects in the datetime module, I suggest adding isoformat()/fromisoformat() methods for datetime.timedelta that implement ISO 8601 durations.

ISO 8601 durations support years and months that are not valid timedelta arguments because they are non-precise durations. I suggest throwing an exception if the conversion to or from timedelta cannot be done safely.

https://pypi.org/project/isodate/ implements a parse_duration() method that could be used for inspiration.

@ErikCederstrand ErikCederstrand mannequin added 3.10 only security fixes stdlib Python modules in the Lib dir labels Oct 20, 2020
@ErikCederstrand
Copy link
Mannequin Author

ErikCederstrand mannequin commented Oct 20, 2020

Among other things, ISO 8601 duration strings are commonly used to communicate offset values in timezone definitions.

@vadmium
Copy link
Member

vadmium commented Oct 20, 2020

There is related discussion in bpo-41254, about duration formats more generally.

@pganssle
Copy link
Member

This is probably more feasible than the proposal in bpo-41254 since it's a well-defined spec (mostly — it includes an optional alternative format and the number of digits allowed is defined "by agreement", thus defeating the purpose of using a spec in the first place) that's not even particularly difficult to implement, but there are still a few problems (and one reason I've never implemented this, despite desperately wanting a better string representation for time deltas). Two minor problems first:

  1. Unlike ISO 8601 datetimes, these are not especially "human-friendly" formats, so I don't think they're especially useful for displaying timedeltas.

  2. Also unlike ISO 8601 datetimes, I don't think these are in particularly wide use, or widely supported. That's not a major strike against it, but if it's not useful as something to show to humans and it's not especially useful as something to show to / read from other computers, that weighs against its inclusion in the standard library.

The biggest problem, however, is that timedelta does not and cannot represent "Year" or "Month", which means that P1Y or P1M would always need to be invalid to parse. We could eliminate this format, but it means that we would never at any point in the future be able to implement a parser for the full spec. Since the concept of a year and a month are ambiguous and at least the 2016 version of ISO 8601 doesn't seem to define what it means for a duration to last 1 year or 1 month, you can't even really count on such a thing as an interchange format, because different implementations might give you different results! What does 20200131T00:00:00/P1M represent? The interval (2020-01-31, 2020-02-29)? (2020-01-31, 2020-03-02)? Something else?

A better target for parsing ISO 8601 durations would be something like dateutil.relativedelta, which does have defined semantics for years and months (though as I mentioned above, those are not necessarily consistent with the semantics of other libraries parsing or writing out this format).

I am also not entirely clear on whether "weeks" is just an alias for "7 days" or if it means something related to weeks in the ISO calendar (and if that makes a difference for durations).

I imagine that generating these formats is a bit more forgiving, because you would simply never generate the forbidden formats, and we can offer configuration options in the formatter method to allow the user to tweak the various ambiguities in the spec.

@ErikCederstrand
Copy link
Mannequin Author

ErikCederstrand mannequin commented Nov 18, 2020

There are two conflicting interests: ISO 8601 that allows non-precise durations, and timedelta that assumes precise durations.

For me, the non-precise durations only make sense in date arithmetic - to a human, it's pretty clear what adding 3 months or a year will do to the date. There may be edge cases when crossing DST, but normal arithmetic with timezone also have those cases.

Regarding ISO weeks, I'm pretty sure that they are only special in regards to calculating week numbers and the weekday they start. They still have a duration of 7 days.

Apart from being able to parse ISO durations coming from other systems, the non-precise durations would be useful e.g. when implementing recurring events. Calculating a series of dates for something that happens on the 12th day of every 2nd month is doable in Python, but not with the aid of timedelta.

I see four options here:

  1. expand timedelta to allow month and year, with the implication that e.g. total_seconds() would fail or be ambiguous for these timedeltas

  2. implement only the parts of ISO 8601 that can safely be represented by the current timedelta

  3. add a new relativetimedelta class that allows representing non-precise durations

  4. do nothing and leave it to 3rd party packages to implement this

@jayaddison
Copy link

  1. implement only the parts of ISO 8601 that can safely be represented by the current timedelta

After learning about this ticket, I've attempted an implementation of timedelta.fromisoformat and timedelta.isoformat in a library called timestamp-iso8601 (published on PyPi and GitHub).

It's freshly-prepared and unreviewed so far and I'd welcome any feedback on it.

The library provides a subclass of datetime.timedelta that can be used as a drop-in replacement to parse and serialize ISO 8601 durations.

The library has no external dependencies and has been developed with performance in mind, albeit not as the primary goal. Test coverage is included in the source repository.

  1. expand timedelta to allow month and year, with the implication that e.g. total_seconds() would fail or be ambiguous for these timedeltas

The library has some limitations, and absence of support for representation of months and years in datetime.timedelta objects certainly affects it. The code is designed to be forwards-compatible so that construction of year-aware and month-aware durations would activate if-and-when supported by datetime.timedelta.

@simon04
Copy link
Contributor

simon04 commented Oct 10, 2022

There are two conflicting interests: ISO 8601 that allows non-precise durations, and timedelta that assumes precise durations.

Go's time.ParseDuration supports units from ns to h, and strings such as "300ms", "-1.5h" or "2h45m".

Java differentiates between time-durations implemented as java.time.Durations and date-durations implemented as java.time.Period. The former stores durations in terms of seconds and nanoseconds, and parses from units ns to h; in addition, days can be parsed as standard 24 hour days. Durations.parse(...) is implemented using a regular expression defined in https://github.com/openjdk/jdk/blob/1bfcc2790adbc273864c74faab0bd43613c75982/src/java.base/share/classes/java/time/Duration.java#L154-L157

.NET uses a similar concept as TimeSpan, but parses from a different syntax.

@jayaddison
Copy link

After learning about this ticket, I've attempted an implementation of timedelta.fromisoformat and timedelta.isoformat in a library called timestamp-iso8601 (published on PyPi and GitHub).

My apologies here: the license terms that this library is currently under may have caused a license violation, and so I plan to yank the PyPi libraries and make the GitHub repository private until questions about those can be resolved.

@benkehoe
Copy link
Contributor

For reference in the absence of @jayaddison's code, here is an implementation of fromisoformat and isoformat that I wrote: https://gist.github.com/benkehoe/5b03c308b038b29e42106f602e554010

I believe strongly that timedelta deserves parse/format methods, but I can see the problems with not supporting the full ISO spec (in my code, I parse years and months and raise an exception about lack of support, and in the docs clarify that e.g. P1DT12H is treated identically to PT36H). The counterbalance is that while, say, Go's solution is a good alternative in isolation, it means its parse/format methods are named and work different from the rest of the datetime classes.

@jayaddison
Copy link

Apologies for what might be a slightly repetitive message here, but: given some concerns about the timedelta-iso8601 library, which repurposed a couple of method signatures and docstrings from cpython.git, and because I wanted to get this functionality back out there, I've re-implemented the same functionality from nowt in a clean environment without looking at or copying any code from cpython.git.

The updated library is available under an AGPLv3 license in source form as timedelta-isoformat on GitHub and packaged as a wheel named timedelta-isoformat on PyPi.

@jayaddison
Copy link

@benkehoe any chance you could re-run your benchmark comparison against timedelta-isoformat v0.4.1?

@jayaddison
Copy link

jayaddison commented Dec 5, 2022

As a heads-up for anyone following along (and please speak up if this is noise - I'll adjust and find a better way to communicate): timedelta-isoformat v0.4.1 remains available on GitHub, but is deprecated.

In particular, two important bugs have been addressed since that version:

  • An issue with range checks for time-segment elements: a limit of 366 days was configured within date-segment elements, and 59 seconds within time-segment elements -- but the same logic was used to evaluate both. That's incorrect: in the date-segment context, 366 is an inclusive-range-limit, whereas in the time-segment context, anything up-to 60 exclusive-range-limit is acceptable.
  • String formatting for the seconds (S) component of serialized results incorrectly relied on some string-formatting defaults, allowing the value contained within the corresponding element to be presented in scientific notation -- that's not valid within ISO-8601 designator-separated fields, as far as I'm aware

Additionally: an intentional decision was made to handle all parsing of values-to-numbers using the float type. Although in practice many duration strings communicated are likely to contain short values (<= 10 digits), there is apparently no known linear-time algorithm to transform decimal strings into base-ten integers.. so let's be on the safe side and use float parsing (until such time as a radical rethink of that policy is required -- or someone knows better and can share that).

@simon04
Copy link
Contributor

simon04 commented Dec 10, 2022

Java differentiates between time-durations implemented as java.time.Durations and date-durations implemented as java.time.Period

Here's a Python implementation for fromisoformat in 25 LOC plus some unit tests: https://gist.github.com/simon04/90ad63486022fd110e5aea58e8ecb411

@pganssle
Copy link
Member

I don't think we need any more implementations. The implementation here was never the problem. The big unaddressed issues are about who wants this thing and why.

If people want a human-friendly way to print timedelta, this isn't it. If people want to be able to parse arbitrary ISO8601 durations, timedelta is not the right output type. Is there a real use case for this? If not, we should work on solving the problems people have rather than creating something that almost works.

@simon04
Copy link
Contributor

simon04 commented Dec 10, 2022

Is there a real use case for this?

There definitively is! To parse delays, timeouts, lifetimes. The project https://github.com/caddyserver/caddy (not Python) has 35 usages of ParseDuration. A codebase of mine has various ad-hoc implementations and would benefit from a datetime.timedelta.fromisoformat --

datetime.timedelta(days=int(match_days.group("days")))
datetime.timedelta(hours=int(match_hours.group("hours")))
datetime.timedelta(seconds=ini.getint(section, "min_mod_diff"))
datetime.timedelta(days=float(ini.get(section, "days")))
datetime.timedelta(hours=float(ini.get(section, "hours")))
datetime.timedelta(days=int(ini.get(section, "DAYS")))
datetime.timedelta(seconds=ini.getint(section, "MaxAgeSeconds"))
datetime.timedelta(seconds=ini.getint(section, "MinAgeSeconds"))
datetime.timedelta(days=int(match_days.group("days")))
datetime.timedelta(hours=int(match_hours.group("hours")))
datetime.timedelta(seconds=ini.getint(section, "min_mod_diff"))
datetime.timedelta(days=float(ini.get(section, "days")))
datetime.timedelta(hours=float(ini.get(section, "hours")))
datetime.timedelta(days=int(ini.get(section, "DAYS")))
datetime.timedelta(seconds=ini.getint(section, "MaxAgeSeconds"))
datetime.timedelta(seconds=ini.getint(section, "MinAgeSeconds"))

Thanks!

@pganssle
Copy link
Member

There definitively is! To parse delays, timeouts, lifetimes.

Why do these need to be ISO 8601 durations rather than some other, better format?

@simon04
Copy link
Contributor

simon04 commented Dec 10, 2022

ISO 8601 is not a strict necessity here, but a handy standard that can be used. Also for symmetry with datetime.datetime.fromisoformat

Is Go's syntax is preferrable?

Go's time.ParseDuration supports units from ns to h, and strings such as "300ms", "-1.5h" or "2h45m".

@davetapley
Copy link

@samypr100
Copy link
Contributor

Note, some other notable libraries that try to do this such as Pandas. See https://pandas.pydata.org/docs/reference/api/pandas.Timedelta.isoformat.html.

I've also used isodate quite extensively in lieu of dateutil.relativedelta.

This would be a welcome addition to the standard library.

@samypr100
Copy link
Contributor

samypr100 commented Feb 25, 2024

The big unaddressed issues are about who wants this thing and why.

@pganssle I think the number of libraries trying to accomplish the same thing (albeit with different trade-offs) is a signal to the desire.

From my experience I usually see a mix of these libraries being used to parse durations in business/data-science applications (e.g. ML generated input/output or more recent LLM input/output). For example, LLM applications can be better at generating these type of durations from natural language input versus full datetimes, hence it's usefulness.

@dkg
Copy link

dkg commented Nov 14, 2024

if there are other standards besides ISO 8601 for duration formats, could someone point to them? there is a real need to be able to parse a standard duration format, and if ISO 8601 is not it, then i'd be grateful for a pointer to some alternative standard.

I've seen a lot of comments like "we all know what +1h2m means" but it gets pretty hairy when you start talking about variants like -1Y2.5M (does the negative apply to the whole term or just to the first part? what about when some months are longer than other months? and what is a half-month anyway? what does this duration do about leap years?)

I'd even be happy with a deliberately restrictive standard that only parses days, hours, minutes, and seconds and rejects units larger than that. (assuming that we don't care about leap seconds, so days are uniformly 86400 seconds long). but i don't know where that standard is.

@picnixz picnixz added type-feature A feature request or enhancement and removed 3.10 only security fixes labels Nov 14, 2024
@vadmium
Copy link
Member

vadmium commented Nov 14, 2024

The P prefix and lack of spacing tends to puts me off ISO 8601. In general I prefer the other HTML 5 “duration string” https://html.spec.whatwg.org/multipage/common-microsyntaxes.html#durations as a readable standard format, e.g. 1w 0d 12h 0m 27.001s. It has weeks, days, hours, minutes and seconds components, with fractional seconds using a decimal point, down to 0.001 s resolution. It allows spaces between components and before units. It does not do negative durations, nor fractional minutes or higher.

For negative durations, I agree −1h 2m would be ambiguous (especially with the space included). I’d consider putting it in brackets: −(1h 2m). Or relying on ISO 8601; I don’t think −PT1h2m is so ambiguous.

@StanFromIreland

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
Development

No branches or pull requests

10 participants