Skip to content

Remove DYNAMICTOP_PTR #12057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 46 commits into from
Sep 2, 2020
Merged

Remove DYNAMICTOP_PTR #12057

merged 46 commits into from
Sep 2, 2020

Conversation

kripken
Copy link
Member

@kripken kripken commented Aug 26, 2020

DYNAMICTOP_PTR has been a pointer to where DYNAMICTOP is stored,
which represented the top of "dynamic" memory (not static, and not stack; or
in other words, dynamic == managed by sbrk/malloc). We represented it on the
JS side so that we could allocate from JS directly, in particular during startup.
However, that has caused a bunch of complexity that is not really worth it, and
it's also work done after link that we'd like to remove (WebAssembly/binaryen#3043).

This PR moves us to a place where malloc-like allocation is disallowed from
JS during startup. Instead, JS should call sbrk or malloc normally (which can
only be done after startup).

This change removes the final static allocation from JS, which means that we
no longer adjust memory layout after link, and therefore this PR removes the
special updating of the stack pointer and sbrk location that we used to do.
(This just removes the --pass-arg stuff for those in emcc.py, to show that
this works and passes tests - there is a bunch more code that can be removed
but is deferred to keep this as small as possible.)

Dynamic linking, however is an exception: we need to allocate room for
dynamic libraries during startup (the main module needs them before it
starts to run). To support that, this PR still keeps around getMemory()
(which does a dynamic allocation during startup) and dynamicAlloc in
a simplified form. That form just updates __heap_base as we allocate,
and then when the main program starts, __heap_base is a normal
extern global that it receives, and it initializes sbrk using that (after
that point, dynamic allocations are disallowed and asserted against).

This change should not be user-visible, except for removing the allocate()
JS function's ALLOC_DYNAMIC option (since we no longer allow that type
of allocation).

This can only land when it removes the last static allocation from JS,
which means it is blocked by #12042 and #12055.

This has a small code size benefit, basically it removes things like
these:

340,341d339
< var DYNAMIC_BASE = 5263904, DYNAMICTOP_PTR = 20992;
< 
361,362d358
< HEAP32[DYNAMICTOP_PTR >> 2] = DYNAMIC_BASE;
< 

The benefit to pthreads builds is larger as it removes a bunch of special
code for DYNAMICTOP_PTR that we had there.

Note that I'm not totally happy with the sbrk implementation, but I
tried some other approaches and they were worse for code size. But
ideas welcome.

@kripken kripken requested a review from sbc100 August 26, 2020 20:42
Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ❤️ this change. I've been wanting to do this for a very long time.

emmalloc_unclaimed_heap_memory: function() {
var dynamicTop = HEAPU32[_emscripten_get_sbrk_ptr()>>2];
var dynamicTop = _sbrk();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the idea of just use _sbrk() here.. in my various attempts at this I just moved _emscripten_get_sbrk_ptr into normal C code (which also works), but this is simpler.

assert(DYNAMICTOP_PTR);
assert(HEAP32[DYNAMICTOP_PTR>>2] <= HEAP8.length);
if (dest + bytes > brk) abort('segmentation fault, exceeded the top of the available dynamic heap when storing ' + bytes + ' bytes to address ' + dest + '. DYNAMICTOP=' + brk);
assert(brk);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert is no long meaningful I think since we are no longer dealing with a pointer here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it still has value. 0 is never a valid break location, and may indicate that we've failed to initialize sbrk properly.

In fact this could be expanded into assert(brk >= STACK_BASE) since the break has to be after globals and the stack. Or is that too much?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the stronger assert now, but I can remove it if you think I'm missing something.

assert(DYNAMICTOP_PTR);
assert(HEAP32[DYNAMICTOP_PTR>>2] <= HEAP8.length);
if (dest + bytes > brk) abort('segmentation fault, exceeded the top of the available dynamic heap when loading ' + bytes + ' bytes from address ' + dest + '. DYNAMICTOP=' + brk);
assert(brk);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -29,6 +29,23 @@
#define SET_ERRNO()
#endif

static intptr_t sbrk_val = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just do static intptr_t sbrk_val = (intptr_t)&__heap_base right here at least in the non-pthread case?

In fact, even in the pthread case I should work since it will generate an assignment from GOT.mem.__heap_base on startup

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can work, but is less good for code size it seems. The issue is that static intptr_t sbrk_val = (intptr_t)&__heap_base; adds that number into the memory segment initializer. If wasm-opt then manages to remove malloc and sbrk, it still can't remove that. So super-tiny programs end up almost empty, but with a couple of bytes in a memory segment for no obvious reason. We have a test that fails on this, in fact, the metadce test that exports nothing and expects to get an empty wasm file with no data and no functions.

So it's better for super-tiny programs to do it this way. And for large programs that use malloc, it doesn't matter much either way we do it. But, I'm not totally happy with this I admit...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have quite a strong preference for static intptr_t sbrk_val = (intptr_t)&__heap_base; ... since it lets the linker do the work of assign the correct value.

Also, in smaller programs that don't reference sbrk_val at all the linker can eliminate this global completely, no?

This kind of micro optimization doesn't seem worth it. There is also the cost in terms of wasm code size because the size of you emscripten_get_sbrk_ptr is bigger that just:

return &sbrk_val

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in smaller programs that don't reference sbrk_val at all the linker can eliminate this global completely, no?

No, because we link in malloc by default, and only remove it through metadce when possible.

I think it would be good to move towards a model that does not link in malloc by default, and can then do it the way we both prefer, but that's not trivial and I think should be kept separate from this. For now, I'd like to not regress code size or remove the existing test that breaks unless this is properly optimized.

(It's true this has a small cost in code size when malloc is included. But if malloc is there then it's a tiny addition to the much larger malloc; and if malloc isn't there, the saved code size with this optimization is much more noticeable.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced... the only programs that can benefit here are ones that don't use the heap at all (not just malloc... but any heap usage at all right?)

Do you think we need to be optimizing for such programs?

How about we add an option to not export malloc by default that such (very unusual) programs could use to get this saving?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MINIMAL_RUNTIME and STANDALONE_WASM programs already don't export malloc by default so won't be effected.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'm OK with the TODO here... but I'd perhaps rather invert it by added a TODO to emcc.py line 1408 that says "TODO: exporting malloc here causes sbrk_ptr to be included and it cannot be removed by metadce. Find a way to avoid exporting malloc by default?"

:)

But lgtm this way too if you feel strongly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not convinced... the only programs that can benefit here are ones that don't use the heap at all (not just malloc... but any heap usage at all right?) Do you think we need to be optimizing for such programs?

Yes, I think it's nice that we emit a tiny wasm for

(module
  (func "double_it" (param $x i32)
    (i32.mul (i32.const 2) (local.get $x))))

Such pure computational stuff shouldn't have data segments, and it would be a regression to change that.

But the good news is that I have a PR almost ready for this, in the nomalloc branch! So I don't think any TODO comment is needed here. It passes tests and shrinks code size, but I'll do some more careful work to try to avoid regressions before I open it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sgtm

@@ -41,7 +58,6 @@ void *sbrk(intptr_t increment) {
intptr_t expected;
while (1) {
#endif // __EMSCRIPTEN_PTHREADS__

intptr_t* sbrk_ptr = emscripten_get_sbrk_ptr();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also drop this line and just &sbrk_val everywhere in this function instead?

kripken added a commit to WebAssembly/binaryen that referenced this pull request Sep 2, 2020
See emscripten-core/emscripten#9039 (comment)

The valid stack area is a region [A, B] in memory. Previously we just checked that
new stack positions S were S >= A, which prevented us from growing too much
(the stack grows down). But that only worked if the growth was small enough to not
overflow and become a big unsigned value. This PR makes us check the other way
too, which requires us to know where the stack starts out at.

This still supports the old way of just passing in the growth limit. We can remove it
after the roll.

In principle this can all be done on the LLVM side too after emscripten-core/emscripten#12057
but I'm not sure of the details there, and this is easy to fix here and get testing up
(which can help with later LLVM work).

This helps emscripten-core/emscripten#11860 by allowing us to clean up
some fastcomp-specific stuff in tests.
@kripken kripken merged commit 0882510 into master Sep 2, 2020
@kripken kripken deleted the yolo2 branch September 2, 2020 21:50
sbc100 added a commit that referenced this pull request Oct 30, 2020
The statement that lived inside this block was removed in #12057.
sbc100 added a commit that referenced this pull request Oct 30, 2020
The statement that lived inside this block was removed in #12057.
sbc100 added a commit that referenced this pull request Apr 12, 2023
All this setting was doing was setting MALLOC=none so it doesn't seem
like it needs to exist as a separate setting.

I think it used to do more, for example, prior to #12790 it used was
used in some JS library code.  Another PR that removed a lot of usage
of this settings was #12057.

Simply removing it simplifies another change I'm working on: #18905.

Also, the docs were incorrect in stating it was ignored except under
MINIMAL_RUNTIME, but we had no testing for it.
sbc100 added a commit that referenced this pull request Apr 12, 2023
All this setting was doing was setting MALLOC=none so it doesn't seem
like it needs to exist as a separate setting.

I think it used to do more, for example, prior to #12790 it used was
used in some JS library code.  Another PR that removed a lot of usage
of this settings was #12057.

Simply removing it simplifies another change I'm working on: #18905.

Also, the docs were incorrect in stating it was ignored except under
MINIMAL_RUNTIME, but we had no testing for it.
sbc100 added a commit that referenced this pull request Apr 12, 2023
All this setting was doing was setting MALLOC=none so it doesn't seem
like it needs to exist as a separate setting.

I think it used to do more, for example, prior to #12790 it used was
used in some JS library code.  Another PR that removed a lot of usage
of this settings was #12057.

Simply removing it simplifies another change I'm working on: #18905.

Also, the docs were incorrect in stating it was ignored except under
MINIMAL_RUNTIME, but we had no testing for it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants