Skip to content

strftime() mishandles quoted percent, e.g. %%z #16155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tiran opened this issue Jan 30, 2022 · 4 comments · Fixed by #16184
Closed

strftime() mishandles quoted percent, e.g. %%z #16155

tiran opened this issue Jan 30, 2022 · 4 comments · Fixed by #16184

Comments

@tiran
Copy link
Contributor

tiran commented Jan 30, 2022

Emscripten's strftime does not parse quoted percent correctly. It treats %%z as % + %z instead of %% + z. For %%z the function returns %+0000 instead of %z. I detected the problem in CPython port to Emscripten.

Version of emscripten/emsdk:

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.1 (1934a98e709b57d3592b8272d3f1264a72c089e4)
clang version 14.0.0 (https://github.com/llvm/llvm-project f142c45f1e494f8dbdcc1bcf14122d128ac8f3fe)
Target: wasm32-unknown-emscripten
Thread model: posix
InstalledDir: /emsdk/upstream/bin

reproducer

#include <time.h>
#include <stdio.h>
#include <stdlib.h>

int
main(void)
{
    char buf[256];
    char *fmt = "%H:%M:%S %%Z=%Z %%z=%z";
    time_t t;
    struct tm *tm;

    t = time(NULL);
    tm = localtime(&t);

    strftime(buf, sizeof(buf), fmt, tm);
    fprintf(stdout, "%s -> %s\n", fmt, buf);
    return 0;
}

glibc output

# gcc -o fmttime fmttime.c && ./fmttime  
%H:%M:%S %%Z=%Z %%z=%z -> 15:33:56 %Z=UTC %z=+0000

emcc + node output

# emcc -o fmttime.js fmttime.c && node fmttime.js 
%H:%M:%S %%Z=%Z %%z=%z -> 15:34:28 %Coordinated Universal Time=Coordinated Universal Time %+0000=+0000
@kripken
Copy link
Member

kripken commented Jan 31, 2022

Looking at the JS code for strtime, there isn't a trivial fix. The main processing code does regex updates, and a different approach is necessary to handle ambiguous patterns.

I took a look at the cost of using musl code for time stuff. It looks like the cost is 2-3% code size increases in a few of our tests, which is not horrible but also better to avoid, so I'm not sure we want to go down that route. Here is the diff if anyone else wants to experiment:

commit 7bb33569a375d407407894f71ba279ba3b6bffd9
Author: Alon Zakai <[email protected]>
Date:   Mon Jan 31 13:48:37 2022 -0800

    wip

diff --git a/system/lib/libc/wasi-helpers.c b/system/lib/libc/wasi-helpers.c
index b71ac92c8..0eaba304e 100644
--- a/system/lib/libc/wasi-helpers.c
+++ b/system/lib/libc/wasi-helpers.c
@@ -7,6 +7,7 @@
 
 #include <errno.h>
 #include <stdlib.h>
+#include <time.h>
 #include <wasi/api.h>
 #include <wasi/wasi-helpers.h>
 
@@ -26,3 +27,10 @@ int  __wasi_fd_is_valid(__wasi_fd_t fd) {
   }
   return 1;
 }
+
+#define NSEC_PER_SEC (1000 * 1000 * 1000)
+
+struct timespec __wasi_timestamp_to_timespec(__wasi_timestamp_t timestamp) {
+  return (struct timespec){.tv_sec = timestamp / NSEC_PER_SEC,
+                           .tv_nsec = timestamp % NSEC_PER_SEC};
+}
diff --git a/system/lib/standalone/standalone.c b/system/lib/standalone/standalone.c
index 227136d81..9492cc320 100644
--- a/system/lib/standalone/standalone.c
+++ b/system/lib/standalone/standalone.c
@@ -34,13 +34,6 @@ _Static_assert(CLOCK_MONOTONIC == __WASI_CLOCKID_MONOTONIC, "must match");
 _Static_assert(CLOCK_PROCESS_CPUTIME_ID == __WASI_CLOCKID_PROCESS_CPUTIME_ID, "must match");
 _Static_assert(CLOCK_THREAD_CPUTIME_ID == __WASI_CLOCKID_THREAD_CPUTIME_ID, "must match");
 
-#define NSEC_PER_SEC (1000 * 1000 * 1000)
-
-struct timespec __wasi_timestamp_to_timespec(__wasi_timestamp_t timestamp) {
-  return (struct timespec){.tv_sec = timestamp / NSEC_PER_SEC,
-                           .tv_nsec = timestamp % NSEC_PER_SEC};
-}
-
 int clock_getres(clockid_t clk_id, struct timespec *tp) {
   // See https://github.com/bytecodealliance/wasmtime/issues/3714
   if (clk_id > __WASI_CLOCKID_THREAD_CPUTIME_ID || clk_id < 0) {
diff --git a/tools/system_libs.py b/tools/system_libs.py
index cc403d96f..a826521ef 100644
--- a/tools/system_libs.py
+++ b/tools/system_libs.py
@@ -906,7 +906,21 @@ class libc(MuslInternalLibrary,
           'nanosleep.c',
           'clock_nanosleep.c',
           'ctime_r.c',
-        ])
+          'strftime.c',
+          '__month_to_secs.c',
+          '__secs_to_tm.c',
+          '__tm_to_secs.c',
+          '__tz.c',
+          '__year_to_secs.c',
+          'clock.c',
+          'clock_gettime.c',
+          'difftime.c',
+          'gettimeofday.c',
+          'localtime_r.c',
+          'gmtime_r.c',
+          'mktime.c',
+          'timegm.c',
+          'time.c'])
     libc_files += files_in_path(
         path='system/lib/libc/musl/src/legacy',
         filenames=['getpagesize.c', 'err.c'])
@@ -1558,24 +1572,6 @@ class libstandalonewasm(MuslInternalLibrary):
     files += files_in_path(
         path='system/lib/libc',
         filenames=['emscripten_memcpy.c'])
-    # It is more efficient to use JS methods for time, normally.
-    files += files_in_path(
-        path='system/lib/libc/musl/src/time',
-        filenames=['strftime.c',
-                   '__month_to_secs.c',
-                   '__secs_to_tm.c',
-                   '__tm_to_secs.c',
-                   '__tz.c',
-                   '__year_to_secs.c',
-                   'clock.c',
-                   'clock_gettime.c',
-                   'difftime.c',
-                   'gettimeofday.c',
-                   'localtime_r.c',
-                   'gmtime_r.c',
-                   'mktime.c',
-                   'timegm.c',
-                   'time.c'])
     # It is more efficient to use JS for __assert_fail, as it avoids always
     # including fprintf etc.
     files += files_in_path(

@sbc100
Copy link
Collaborator

sbc100 commented Feb 2, 2022

Presumably this means that this issue does not effect -sSTANDALONE_WASM builds? Is using that flag and option for your @tiran? Otherwise I guess we could consider adding some way to opt into this less efficient codepath?

@sbc100
Copy link
Collaborator

sbc100 commented Feb 2, 2022

Does this mean that all usage of %% when it precedes a valid format char are broken? How about this approach: Replace all occurrence of %% before applying formatting regexes and then injecting them back afterwords?

@sbc100
Copy link
Collaborator

sbc100 commented Feb 2, 2022

(I'm taking a stab at it)

sbc100 added a commit that referenced this issue Feb 2, 2022
sbc100 added a commit that referenced this issue Feb 2, 2022
sbc100 added a commit that referenced this issue Feb 2, 2022
sbc100 added a commit that referenced this issue Feb 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants