Skip to content

Exported int sometimes corrupted with -sMAIN_MODULE #22980

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hoodmane opened this issue Nov 21, 2024 · 6 comments · Fixed by #23020
Closed

Exported int sometimes corrupted with -sMAIN_MODULE #22980

hoodmane opened this issue Nov 21, 2024 · 6 comments · Fixed by #23020

Comments

@hoodmane
Copy link
Collaborator

hoodmane commented Nov 21, 2024

Sometimes references a C variable via HEAP32[_variable/4] doesn't work.

a.c

const int my_number = 123456;

b.c

extern const int my_number;
#include "stdio.h"

int main(void) {
    printf("my_number: %d\n", my_number);
}

pre.js

Module.preRun = () => {
    console.log("HEAP32[_my_number/4]:", HEAP32[_my_number/4]);
}

Compile, link, execute

emcc -fPIC -c a.c
emcc -fPIC -c b.c

LDFLAGS="\
    -sEXPORTED_FUNCTIONS=_main,_my_number \
    -sMAIN_MODULE=1 \
    --pre-js=pre.js \
"

emcc $LDFLAGS b.o a.o
node a.out.js

Edit: To reproduce, b.o comes before a.o.

Output

I got:

HEAP32[_my_number/4]: 1701209717
my_number: 123456

It should print:

HEAP32[_my_number/4]: 123456
my_number: 123456

Changes that fix it

  1. Drop -sMAIN_MODULE or -sMAIN_MODULE=2.
  2. Swap the order of the object files a.o and b.o so that the link command is emcc $LDFLAGS a.o b.o

If I drop the const in a.c, then it seems to return the same wrong number independent of the link order.

Version of emscripten/emsdk

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.72-git (65f3d78fed2d1786afe278ec20bbe13425d8a51c)
clang version 20.0.0git (https:/github.com/llvm/llvm-project 50866e84d1da8462aeb96607bf6d9e5bbd5869c5)
Target: wasm32-unknown-emscripten
Thread model: posix

and

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.68 (ceee49d2ecdab36a3feb85a684f8e5a453dde910)
clang version 20.0.0git (https:/github.com/llvm/llvm-project 5cc64bf60bc04b9315de3c679eb753de4d554a8a)
Target: wasm32-unknown-emscripten
Thread model: posix
@hoodmane hoodmane changed the title Exported const int sometimes corrupted with -sMAIN_MODULE Exported int sometimes corrupted with -sMAIN_MODULE Nov 21, 2024
@sbc100
Copy link
Collaborator

sbc100 commented Nov 21, 2024

I can't seem to reproduce this on tot:

$ cat build.sh 
./emcc -fPIC -c a.c
./emcc -fPIC -c b.c

LDFLAGS="\
    -sEXPORTED_FUNCTIONS=_main,_my_number \
    -sMAIN_MODULE=1 \
    --pre-js=pre.js \
"

./emcc $LDFLAGS a.o b.o
node a.out.js
$ sh build.sh 
emcc: warning: EXPORTED_FUNCTIONS is not valid with LINKABLE set (normally due to SIDE_MODULE=1/MAIN_MODULE=1) since all functions are exported this mode.  To export only a subset use SIDE_MODULE=2/MAIN_MODULE=2 [-Wunused-command-line-argument]
HEAP32[_my_number/4]: 123456
my_number: 123456

@hoodmane
Copy link
Collaborator Author

I am still getting this on tip of tree:

emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.72-git (079e2660071d5ae7d45d8e04efc09dd7e67456dd)

@hoodmane
Copy link
Collaborator Author

I'm sorry, I got the order to link the object files wrong in the original report.
Here's a self contained script:

#!/bin/bash
rm -rf build && mkdir build && cd build
echo "const int my_number = 123456;" > a.c
cat << EOF > b.c
extern const int my_number;
#include "stdio.h"

int main(void) {
    printf("my_number: %d\n", my_number);
}
EOF
cat << EOF > pre.js
Module.preRun = () => {
    console.log("HEAP32[_my_number/4]:", HEAP32[_my_number/4]);
}
EOF

emcc -fPIC -c a.c
emcc -fPIC -c b.c


LDFLAGS="\
    -sEXPORTED_FUNCTIONS=_main,_my_number \
    -sMAIN_MODULE=1 \
    --pre-js=pre.js \
"

if test $# -eq 0; then
    echo "This works fine:"
    emcc $LDFLAGS a.o b.o
else
    echo "This does not work:"
    emcc $LDFLAGS b.o a.o 
fi
node a.out.js

If you run it with no arguments, you get the correct output. If you run it with one or more arguments, you get the wrong output.

@hoodmane
Copy link
Collaborator Author

@sbc100 could you try again to see if you can reproduce this?

@sbc100
Copy link
Collaborator

sbc100 commented Nov 26, 2024

Yes, I have been able to reproduce. I simplified a little removing the pre.js file:

$ cat test.sh 
#!/bin/bash
echo "const int my_number = 123456;" > num.c
cat << EOF > main.c
#include "stdio.h"
#include <emscripten/em_asm.h>
extern const int my_number;

int main(void) {
  EM_ASM(console.log("JS:_my_number:", _my_number, HEAP32[_my_number/4]));
  printf("C: my_number: %ld %d\n", (long)&my_number, my_number);
}
EOF

set -e
set -x

./emcc -fPIC -c main.c
./emcc -fPIC -c num.c

LDFLAGS="-sEXPORTED_FUNCTIONS=_main,_my_number -sMAIN_MODULE=1"

./emcc $LDFLAGS main.o num.o
node a.out.js
$ sh test.sh 
+ ./emcc -fPIC -c main.c
+ ./emcc -fPIC -c num.c
+ LDFLAGS='-sEXPORTED_FUNCTIONS=_main,_my_number -sMAIN_MODULE=1'
+ ./emcc -sEXPORTED_FUNCTIONS=_main,_my_number -sMAIN_MODULE=1 main.o num.o
emcc: warning: EXPORTED_FUNCTIONS is not valid with LINKABLE set (normally due to SIDE_MODULE=1/MAIN_MODULE=1) since all functions are exported this mode.  To export only a subset use SIDE_MODULE=2/MAIN_MODULE=2 [-Wunused-command-line-argument]
+ node a.out.js
JS:_my_number: 3660 1634878572
C: my_number: 22128 123456

@sbc100
Copy link
Collaborator

sbc100 commented Nov 26, 2024

An even simpler repro with just a single source file:

#!/bin/bash
cat << EOF > main.c
#include "stdio.h"
#include <emscripten.h>
#include <emscripten/em_asm.h>

EMSCRIPTEN_KEEPALIVE int my_number = 123456;

int main(void) {
  EM_ASM(console.log("JS:_my_number:", _my_number, HEAP32[_my_number/4]));
  printf("C: my_number: %ld %d\n", (long)&my_number, my_number);
}
EOF

set -e
set -x

./emcc -sMAIN_MODULE main.c
node a.out.js

I found that issue and have a fix.

sbc100 added a commit to sbc100/emscripten that referenced this issue Nov 26, 2024
In `-sMAIN_MODULE=1` mode we actually link twice, once to get the
names the user exported symbols then again with everything exported.

We when use the this `base_metadata` to limit the things that we
export to on the JS module.  However for data symbols we cannot use the
addresses/values present in `base_metadata`.

Fixes: emscripten-core#22980
sbc100 added a commit to sbc100/emscripten that referenced this issue Nov 26, 2024
In `-sMAIN_MODULE=1` mode we actually link twice, once to get the
names the user exported symbols then again with everything exported.

We when use the this `base_metadata` to limit the things that we
export to on the JS module.  However for data symbols we cannot use the
addresses/values present in `base_metadata`.

Fixes: emscripten-core#22980
@sbc100 sbc100 closed this as completed in ca8fd33 Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants