Skip to content

Conversation

smortex
Copy link
Contributor

@smortex smortex commented May 3, 2019

Python added a %f field ("Microsecond as a decimal number") in it's strptime() implementation to parse times with subsecond precision.

Borrow this idea and implement it in syslog-ng using the same field descriptor.

TODO:

  • Add support for %f field to parse microseconds;
  • Eat extra digits (i.e. nanosecond precision timestamp, while we only handle microseconds) see request;
  • Move the time parsing code to wall_clock_time_strptime() see comment

@kira-syslogng
Copy link
Contributor

This user does not have permission to start the build. Can one of the admins verify this patch and start the build?
(admin: you have the next options (make sure you checked the code):
"ok to test" to accept this pull request (and further changes) for testing
"test this please" for a one time test run
"add to whitelist" add author of a Pull Request to whitelist (globally, be careful, it means this user can trigger kira for any PR)
do nothing -> CI won't start)

1 similar comment
@kira-syslogng
Copy link
Contributor

This user does not have permission to start the build. Can one of the admins verify this patch and start the build?
(admin: you have the next options (make sure you checked the code):
"ok to test" to accept this pull request (and further changes) for testing
"test this please" for a one time test run
"add to whitelist" add author of a Pull Request to whitelist (globally, be careful, it means this user can trigger kira for any PR)
do nothing -> CI won't start)

@Kokan
Copy link
Collaborator

Kokan commented May 3, 2019

@kira-syslogng ok to test

@kira-syslogng
Copy link
Contributor

Build SUCCESS

@bazsi
Copy link
Collaborator

bazsi commented May 4, 2019

This is actually something I had on my shortlist to implement, and in exactly the same manner (using %f to follow Python's lead here). Thank you.

There's one more problem I originally wanted to solve: since the microseconds part is optional, the dot should be parsed as the part of that piece, and I know %f does not consume the dot, but maybe we should implement a %F which does.

An example:

timestamp: Apr 29 13:58:40.411
date-parser(format("%b %d %H:%M:%S%F"));

Hmm I just noticed that %F is already taken, maybe we could use a multi-letter variant, like "%.f"

what do you think?

bazsi
bazsi previously approved these changes May 4, 2019
Copy link
Collaborator

@bazsi bazsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done! Appreciate the testcases as well. I had some nitpicks as comments, but those are all optional.

@kira-syslogng
Copy link
Contributor

Build SUCCESS

@smortex
Copy link
Contributor Author

smortex commented May 4, 2019

I amended my commit with style fixes, and added a new one to eat extra digit in the usec field as suggested.


Regarding consuming the dot ., I happen to be working on this PR after PuppetDB has changed the way it logs timestamps:

  • 2017-02-02 00:29:16,706 (old, 5.x I guess)
  • 2019-05-04T21:55:46.989+02:00 (new 6.x)

As you can see, the old format seems to use a coma , for separating seconds and millisecond ; while the new format use a dot .. So, as far as I am concerned, leaving responsibility of matching the right symbol outside the field seems more appropriate to me, and the complexity of a modifier does not worth it. It also match what Python does with the %f field.

Maybe we can keep it that way for now, and rely on users feedback to determine if it should be extended it in the future?

@smortex smortex changed the title Add support for usec precision when parsing time [WIP] Add support for usec precision when parsing time May 4, 2019
@bazsi
Copy link
Collaborator

bazsi commented May 6, 2019

I think it makes sense, and there's one more place this could be addressed: introduce support for multiple format() arguments to date-parser, trying them in order and accepting the timestamp at first success.

For example:

date-parser(format(
     "%b %d %H:%M:%S.%f",
     "%b %d %H:%M:%S,%f"
     "%b %d %H:%M:%S",
    ));

This would handle either ',' or '.' and even with the fractions omitted. I have a relevant use-case in the cisco-parser:

        if {
            # timestamp from Cisco Unified Call Manager, example "Jun 14 11:57:27 PM.685 UTC"
            # NOTE: drops msecs and timezone
            filter { match('^[.*]?([A-Za-z]{3} [0-9 ]\d \d{2}:\d{2}:\d{2} ((AM)|(PM)))' value('1') flags(store-matches)); };
            parser { date-parser(format('%b %d %H:%M:%S %p') template("$1")); };
        } elif {
            # timestamp without AM/PM mark, example: "Apr 29 13:58:40.411"
            # NOTE: drops msecs and timezone

            filter { match('^[.*]?([A-Za-z]{3} [0-9 ]\d \d{2}:\d{2}:\d{2})' value('1') flags(store-matches)); };
            parser { date-parser(format('%b %d %H:%M:%S') template("$1")); };
        } else {
            # timestamp with year information, example: "Apr 29 2017 13:58:40.411"
            filter { match('^[.*]?([A-Za-z]{3} [0-9 ]\d \d{4} \d{2}:\d{2}:\d{2})' value('1') flags(store-matches)); };
            parser { date-parser(format('%b %d %Y %H:%M:%S') template("$1")); };
        };

Would you give this a go, even in a separate PR? That would be awesome.

@smortex smortex changed the title [WIP] Add support for usec precision when parsing time Add support for usec precision when parsing time May 6, 2019
@smortex
Copy link
Contributor Author

smortex commented May 6, 2019

@bazsi I added a few commits to move the code around. Is it what you expected?

Regarding adding support for multiple formats to date-parser(), it seems a good idea, but I will not be able to work on this topic right now, so I would recommend splitting this into a separate issue / PR.

@kira-syslogng
Copy link
Contributor

Build SUCCESS

Copy link
Collaborator

@bazsi bazsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some minor requests in comments, but the code looks good. If you guys these I can approve this.

@kira-syslogng
Copy link
Contributor

Build SUCCESS

bazsi
bazsi previously approved these changes May 8, 2019
Copy link
Collaborator

@bazsi bazsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Maybe you could squash your commits so that other reviewers will have an easier time reviewing the changes. Thanks. But if you do that, please make sure that non-functional changes (e.g. moving of the code) remain separated from functional changes.

@smortex
Copy link
Contributor Author

smortex commented May 13, 2019

@bazsi, I am back from holidays 🌴

I squashed the related commits as requested and rebased my work on top of the master branch.

@kira-syslogng
Copy link
Contributor

Build SUCCESS

furiel
furiel previously approved these changes May 14, 2019
Copy link
Collaborator

@furiel furiel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thank you for your contribution!

smortex added 3 commits May 13, 2019 20:56
Python added a '%f' field ("Microsecond as a decimal number") in it's
strptime() implementation to parse times with subsecond precision.

Borrow this idea and implement it in syslog-ng using the same field
descriptor.

Eat all digitis when reading microseconds: if the field has a nanosecond
precision, the value will be truncated at the microsecond and the parser
will be able to process the information following the sub-second
information.
Update strptime_with_tz() signature so that it's parameters match those
of wall_clock_time_strptime().

While here, get rid of _u_char and _uint macros: these macros where
introduced when borrowing code from NetBSD, since the code has forked it
makes no sense to keep them around.

No functional change.
Remove trptime-tz.c and trptime-tz.h which are now useless.

No functional change.
SYSLOG_NG_HAVE_TIMEZONE is always set.
@kira-syslogng
Copy link
Contributor

Build SUCCESS

Copy link
Collaborator

@bazsi bazsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job! Approved on my part too.

@bazsi bazsi merged commit 8b8da7b into syslog-ng:master May 14, 2019
smortex added a commit to smortex/syslog-ng-ose-guides that referenced this pull request May 14, 2019
@smortex smortex deleted the strptime-usec branch May 14, 2019 20:28
@lbudai lbudai added this to the syslog-ng-3.22 milestone May 17, 2019
smortex added a commit to smortex/syslog-ng that referenced this pull request Jun 10, 2019
The conditional assignment to wct->wct_gmtoff was [removed in syslog-ng#2709][1]
because we though that the field previously did not exist in some
circumstances which where not true anymore.  In fact, the macros is here
to help us detect what `timezone` is, since it's not the same on all
platforms:

On Linux, timezone is declared in time.h as:
extern long int timezone;

On FreeBSD, timezone is declared in time.h as:
char *timezone(int zone, int dst);

As a consequence, the assignment on FreeBSD produce a compiler error:

```
  CC       lib/timeutils/libsyslog_ng_la-wallclocktime.lo
lib/timeutils/wallclocktime.c:551:37: error: invalid argument type 'char *(*)(int, int)' to unary expression
                  wct->wct_gmtoff = -(timezone);
                                    ^~~~~~~~~~~
1 error generated.
```

Re-add the removed #ifdef to fix the build on FreeBSD.

[1]: syslog-ng#2709 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants