-
Notifications
You must be signed in to change notification settings - Fork 46
Support datetime extended type #228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
DifferentialOrange
merged 3 commits into
master
from
DifferentialOrange/gh-204-datetime
Sep 26, 2022
Merged
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,3 @@ | ||
msgpack>=1.0.4 | ||
pandas | ||
pytz |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
from tarantool.msgpack_ext.types.datetime import Datetime | ||
|
||
EXT_ID = 4 | ||
|
||
def encode(obj): | ||
return obj.msgpack_encode() | ||
|
||
def decode(data): | ||
return Datetime(data) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,264 @@ | ||
from copy import deepcopy | ||
|
||
import pandas | ||
import pytz | ||
|
||
import tarantool.msgpack_ext.types.timezones as tt_timezones | ||
from tarantool.error import MsgpackError | ||
|
||
# https://www.tarantool.io/en/doc/latest/dev_guide/internals/msgpack_extensions/#the-datetime-type | ||
# | ||
# The datetime MessagePack representation looks like this: | ||
# +---------+----------------+==========+-----------------+ | ||
# | MP_EXT | MP_DATETIME | seconds | nsec; tzoffset; | | ||
# | = d7/d8 | = 4 | | tzindex; | | ||
# +---------+----------------+==========+-----------------+ | ||
# MessagePack data contains: | ||
# | ||
# * Seconds (8 bytes) as an unencoded 64-bit signed integer stored in the | ||
# little-endian order. | ||
# * The optional fields (8 bytes), if any of them have a non-zero value. | ||
# The fields include nsec (4 bytes), tzoffset (2 bytes), and | ||
# tzindex (2 bytes) packed in the little-endian order. | ||
# | ||
# seconds is seconds since Epoch, where the epoch is the point where the time | ||
# starts, and is platform dependent. For Unix, the epoch is January 1, | ||
# 1970, 00:00:00 (UTC). Tarantool uses a double type, see a structure | ||
# definition in src/lib/core/datetime.h and reasons in | ||
# https://github.com/tarantool/tarantool/wiki/Datetime-internals#intervals-in-c | ||
# | ||
# nsec is nanoseconds, fractional part of seconds. Tarantool uses int32_t, see | ||
# a definition in src/lib/core/datetime.h. | ||
# | ||
# tzoffset is timezone offset in minutes from UTC. Tarantool uses a int16_t type, | ||
# see a structure definition in src/lib/core/datetime.h. | ||
# | ||
# tzindex is Olson timezone id. Tarantool uses a int16_t type, see a structure | ||
# definition in src/lib/core/datetime.h. If both tzoffset and tzindex are | ||
# specified, tzindex has the preference and the tzoffset value is ignored. | ||
|
||
SECONDS_SIZE_BYTES = 8 | ||
NSEC_SIZE_BYTES = 4 | ||
TZOFFSET_SIZE_BYTES = 2 | ||
TZINDEX_SIZE_BYTES = 2 | ||
|
||
BYTEORDER = 'little' | ||
|
||
NSEC_IN_SEC = 1000000000 | ||
NSEC_IN_MKSEC = 1000 | ||
SEC_IN_MIN = 60 | ||
|
||
def get_bytes_as_int(data, cursor, size): | ||
part = data[cursor:cursor + size] | ||
return int.from_bytes(part, BYTEORDER, signed=True), cursor + size | ||
|
||
def get_int_as_bytes(data, size): | ||
return data.to_bytes(size, byteorder=BYTEORDER, signed=True) | ||
|
||
def compute_offset(timestamp): | ||
utc_offset = timestamp.tzinfo.utcoffset(timestamp) | ||
|
||
# `None` offset is a valid utcoffset implementation, | ||
# but it seems that pytz timezones never return `None`: | ||
# https://github.com/pandas-dev/pandas/issues/15986 | ||
assert utc_offset is not None | ||
|
||
# There is no precision loss since offset is in minutes | ||
return int(utc_offset.total_seconds()) // SEC_IN_MIN | ||
|
||
def get_python_tzinfo(tz, error_class): | ||
if tz in pytz.all_timezones: | ||
return pytz.timezone(tz) | ||
|
||
# Checked with timezones/validate_timezones.py | ||
tt_tzinfo = tt_timezones.timezoneAbbrevInfo[tz] | ||
if (tt_tzinfo['category'] & tt_timezones.TZ_AMBIGUOUS) != 0: | ||
raise error_class(f'Failed to create datetime with ambiguous timezone "{tz}"') | ||
|
||
return pytz.FixedOffset(tt_tzinfo['offset']) | ||
|
||
def msgpack_decode(data): | ||
cursor = 0 | ||
seconds, cursor = get_bytes_as_int(data, cursor, SECONDS_SIZE_BYTES) | ||
|
||
data_len = len(data) | ||
if data_len == (SECONDS_SIZE_BYTES + NSEC_SIZE_BYTES + \ | ||
TZOFFSET_SIZE_BYTES + TZINDEX_SIZE_BYTES): | ||
nsec, cursor = get_bytes_as_int(data, cursor, NSEC_SIZE_BYTES) | ||
tzoffset, cursor = get_bytes_as_int(data, cursor, TZOFFSET_SIZE_BYTES) | ||
tzindex, cursor = get_bytes_as_int(data, cursor, TZINDEX_SIZE_BYTES) | ||
elif data_len == SECONDS_SIZE_BYTES: | ||
nsec = 0 | ||
tzoffset = 0 | ||
tzindex = 0 | ||
else: | ||
raise MsgpackError(f'Unexpected datetime payload length {data_len}') | ||
|
||
total_nsec = seconds * NSEC_IN_SEC + nsec | ||
datetime = pandas.to_datetime(total_nsec, unit='ns') | ||
|
||
if tzindex != 0: | ||
if tzindex not in tt_timezones.indexToTimezone: | ||
raise MsgpackError(f'Failed to decode datetime with unknown tzindex "{tzindex}"') | ||
tz = tt_timezones.indexToTimezone[tzindex] | ||
tzinfo = get_python_tzinfo(tz, MsgpackError) | ||
return datetime.replace(tzinfo=pytz.UTC).tz_convert(tzinfo), tz | ||
elif tzoffset != 0: | ||
tzinfo = pytz.FixedOffset(tzoffset) | ||
return datetime.replace(tzinfo=pytz.UTC).tz_convert(tzinfo), '' | ||
else: | ||
return datetime, '' | ||
|
||
oleg-jukovec marked this conversation as resolved.
Show resolved
Hide resolved
|
||
class Datetime(): | ||
def __init__(self, data=None, *, timestamp=None, year=None, month=None, | ||
day=None, hour=None, minute=None, sec=None, nsec=None, | ||
tzoffset=0, tz=''): | ||
oleg-jukovec marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if data is not None: | ||
if not isinstance(data, bytes): | ||
raise ValueError('data argument (first positional argument) ' + | ||
'expected to be a "bytes" instance') | ||
|
||
datetime, tz = msgpack_decode(data) | ||
self._datetime = datetime | ||
self._tz = tz | ||
return | ||
|
||
# The logic is same as in Tarantool, refer to datetime API. | ||
# https://www.tarantool.io/en/doc/latest/reference/reference_lua/datetime/new/ | ||
if timestamp is not None: | ||
if ((year is not None) or (month is not None) or \ | ||
(day is not None) or (hour is not None) or \ | ||
(minute is not None) or (sec is not None)): | ||
raise ValueError('Cannot provide both timestamp and year, month, ' + | ||
'day, hour, minute, sec') | ||
|
||
if nsec is not None: | ||
if not isinstance(timestamp, int): | ||
raise ValueError('timestamp must be int if nsec provided') | ||
|
||
total_nsec = timestamp * NSEC_IN_SEC + nsec | ||
datetime = pandas.to_datetime(total_nsec, unit='ns') | ||
else: | ||
datetime = pandas.to_datetime(timestamp, unit='s') | ||
else: | ||
if nsec is not None: | ||
microsecond = nsec // NSEC_IN_MKSEC | ||
nanosecond = nsec % NSEC_IN_MKSEC | ||
else: | ||
microsecond = 0 | ||
nanosecond = 0 | ||
|
||
datetime = pandas.Timestamp(year=year, month=month, day=day, | ||
hour=hour, minute=minute, second=sec, | ||
microsecond=microsecond, | ||
nanosecond=nanosecond) | ||
|
||
if tz != '': | ||
if tz not in tt_timezones.timezoneToIndex: | ||
raise ValueError(f'Unknown Tarantool timezone "{tz}"') | ||
|
||
tzinfo = get_python_tzinfo(tz, ValueError) | ||
self._datetime = datetime.replace(tzinfo=tzinfo) | ||
self._tz = tz | ||
elif tzoffset != 0: | ||
tzinfo = pytz.FixedOffset(tzoffset) | ||
self._datetime = datetime.replace(tzinfo=tzinfo) | ||
self._tz = '' | ||
else: | ||
self._datetime = datetime | ||
self._tz = '' | ||
oleg-jukovec marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
def __eq__(self, other): | ||
if isinstance(other, Datetime): | ||
return self._datetime == other._datetime | ||
elif isinstance(other, pandas.Timestamp): | ||
return self._datetime == other | ||
else: | ||
return False | ||
|
||
def __str__(self): | ||
return self._datetime.__str__() | ||
oleg-jukovec marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
def __repr__(self): | ||
return f'datetime: {self._datetime.__repr__()}, tz: "{self.tz}"' | ||
|
||
def __copy__(self): | ||
cls = self.__class__ | ||
result = cls.__new__(cls) | ||
result.__dict__.update(self.__dict__) | ||
return result | ||
|
||
def __deepcopy__(self, memo): | ||
cls = self.__class__ | ||
result = cls.__new__(cls) | ||
memo[id(self)] = result | ||
for k, v in self.__dict__.items(): | ||
setattr(result, k, deepcopy(v, memo)) | ||
return result | ||
|
||
@property | ||
def year(self): | ||
return self._datetime.year | ||
|
||
@property | ||
def month(self): | ||
return self._datetime.month | ||
|
||
@property | ||
def day(self): | ||
return self._datetime.day | ||
|
||
@property | ||
def hour(self): | ||
return self._datetime.hour | ||
|
||
@property | ||
def minute(self): | ||
return self._datetime.minute | ||
|
||
@property | ||
def sec(self): | ||
return self._datetime.second | ||
|
||
@property | ||
def nsec(self): | ||
# microseconds + nanoseconds | ||
return self._datetime.value % NSEC_IN_SEC | ||
|
||
@property | ||
def timestamp(self): | ||
return self._datetime.timestamp() | ||
|
||
@property | ||
def tzoffset(self): | ||
if self._datetime.tzinfo is not None: | ||
return compute_offset(self._datetime) | ||
return 0 | ||
|
||
@property | ||
def tz(self): | ||
return self._tz | ||
|
||
@property | ||
def value(self): | ||
return self._datetime.value | ||
|
||
def msgpack_encode(self): | ||
seconds = self.value // NSEC_IN_SEC | ||
nsec = self.nsec | ||
tzoffset = self.tzoffset | ||
|
||
tz = self.tz | ||
if tz != '': | ||
tzindex = tt_timezones.timezoneToIndex[tz] | ||
else: | ||
tzindex = 0 | ||
|
||
buf = get_int_as_bytes(seconds, SECONDS_SIZE_BYTES) | ||
|
||
if (nsec != 0) or (tzoffset != 0) or (tzindex != 0): | ||
buf = buf + get_int_as_bytes(nsec, NSEC_SIZE_BYTES) | ||
buf = buf + get_int_as_bytes(tzoffset, TZOFFSET_SIZE_BYTES) | ||
buf = buf + get_int_as_bytes(tzindex, TZINDEX_SIZE_BYTES) | ||
|
||
return buf |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
from tarantool.msgpack_ext.types.timezones.timezones import ( | ||
TZ_AMBIGUOUS, | ||
indexToTimezone, | ||
timezoneToIndex, | ||
timezoneAbbrevInfo, | ||
) | ||
|
||
__all__ = ['TZ_AMBIGUOUS', 'indexToTimezone', 'timezoneToIndex', | ||
'timezoneAbbrevInfo'] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.