-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Remove DYNAMICTOP_PTR #12057
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove DYNAMICTOP_PTR #12057
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ❤️ this change. I've been wanting to do this for a very long time.
emmalloc_unclaimed_heap_memory: function() { | ||
var dynamicTop = HEAPU32[_emscripten_get_sbrk_ptr()>>2]; | ||
var dynamicTop = _sbrk(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the idea of just use _sbrk()
here.. in my various attempts at this I just moved _emscripten_get_sbrk_ptr
into normal C code (which also works), but this is simpler.
src/runtime_safe_heap.js
Outdated
assert(DYNAMICTOP_PTR); | ||
assert(HEAP32[DYNAMICTOP_PTR>>2] <= HEAP8.length); | ||
if (dest + bytes > brk) abort('segmentation fault, exceeded the top of the available dynamic heap when storing ' + bytes + ' bytes to address ' + dest + '. DYNAMICTOP=' + brk); | ||
assert(brk); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert is no long meaningful I think since we are no longer dealing with a pointer here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it still has value. 0 is never a valid break location, and may indicate that we've failed to initialize sbrk properly.
In fact this could be expanded into assert(brk >= STACK_BASE)
since the break has to be after globals and the stack. Or is that too much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the stronger assert now, but I can remove it if you think I'm missing something.
src/runtime_safe_heap.js
Outdated
assert(DYNAMICTOP_PTR); | ||
assert(HEAP32[DYNAMICTOP_PTR>>2] <= HEAP8.length); | ||
if (dest + bytes > brk) abort('segmentation fault, exceeded the top of the available dynamic heap when loading ' + bytes + ' bytes from address ' + dest + '. DYNAMICTOP=' + brk); | ||
assert(brk); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
@@ -29,6 +29,23 @@ | |||
#define SET_ERRNO() | |||
#endif | |||
|
|||
static intptr_t sbrk_val = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just do static intptr_t sbrk_val = (intptr_t)&__heap_base
right here at least in the non-pthread case?
In fact, even in the pthread case I should work since it will generate an assignment from GOT.mem.__heap_base
on startup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That can work, but is less good for code size it seems. The issue is that static intptr_t sbrk_val = (intptr_t)&__heap_base;
adds that number into the memory segment initializer. If wasm-opt then manages to remove malloc and sbrk, it still can't remove that. So super-tiny programs end up almost empty, but with a couple of bytes in a memory segment for no obvious reason. We have a test that fails on this, in fact, the metadce test that exports nothing and expects to get an empty wasm file with no data and no functions.
So it's better for super-tiny programs to do it this way. And for large programs that use malloc, it doesn't matter much either way we do it. But, I'm not totally happy with this I admit...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have quite a strong preference for static intptr_t sbrk_val = (intptr_t)&__heap_base;
... since it lets the linker do the work of assign the correct value.
Also, in smaller programs that don't reference sbrk_val
at all the linker can eliminate this global completely, no?
This kind of micro optimization doesn't seem worth it. There is also the cost in terms of wasm code size because the size of you emscripten_get_sbrk_ptr is bigger that just:
return &sbrk_val
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, in smaller programs that don't reference sbrk_val at all the linker can eliminate this global completely, no?
No, because we link in malloc by default, and only remove it through metadce when possible.
I think it would be good to move towards a model that does not link in malloc by default, and can then do it the way we both prefer, but that's not trivial and I think should be kept separate from this. For now, I'd like to not regress code size or remove the existing test that breaks unless this is properly optimized.
(It's true this has a small cost in code size when malloc is included. But if malloc is there then it's a tiny addition to the much larger malloc; and if malloc isn't there, the saved code size with this optimization is much more noticeable.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not convinced... the only programs that can benefit here are ones that don't use the heap at all (not just malloc... but any heap usage at all right?)
Do you think we need to be optimizing for such programs?
How about we add an option to not export malloc by default that such (very unusual) programs could use to get this saving?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MINIMAL_RUNTIME and STANDALONE_WASM programs already don't export malloc by default so won't be effected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm OK with the TODO here... but I'd perhaps rather invert it by added a TODO to emcc.py line 1408 that says "TODO: exporting malloc here causes sbrk_ptr to be included and it cannot be removed by metadce. Find a way to avoid exporting malloc by default?"
:)
But lgtm this way too if you feel strongly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not convinced... the only programs that can benefit here are ones that don't use the heap at all (not just malloc... but any heap usage at all right?) Do you think we need to be optimizing for such programs?
Yes, I think it's nice that we emit a tiny wasm for
(module
(func "double_it" (param $x i32)
(i32.mul (i32.const 2) (local.get $x))))
Such pure computational stuff shouldn't have data segments, and it would be a regression to change that.
But the good news is that I have a PR almost ready for this, in the nomalloc
branch! So I don't think any TODO comment is needed here. It passes tests and shrinks code size, but I'll do some more careful work to try to avoid regressions before I open it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
@@ -41,7 +58,6 @@ void *sbrk(intptr_t increment) { | |||
intptr_t expected; | |||
while (1) { | |||
#endif // __EMSCRIPTEN_PTHREADS__ | |||
|
|||
intptr_t* sbrk_ptr = emscripten_get_sbrk_ptr(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also drop this line and just &sbrk_val
everywhere in this function instead?
See emscripten-core/emscripten#9039 (comment) The valid stack area is a region [A, B] in memory. Previously we just checked that new stack positions S were S >= A, which prevented us from growing too much (the stack grows down). But that only worked if the growth was small enough to not overflow and become a big unsigned value. This PR makes us check the other way too, which requires us to know where the stack starts out at. This still supports the old way of just passing in the growth limit. We can remove it after the roll. In principle this can all be done on the LLVM side too after emscripten-core/emscripten#12057 but I'm not sure of the details there, and this is easy to fix here and get testing up (which can help with later LLVM work). This helps emscripten-core/emscripten#11860 by allowing us to clean up some fastcomp-specific stuff in tests.
The statement that lived inside this block was removed in #12057.
The statement that lived inside this block was removed in #12057.
All this setting was doing was setting MALLOC=none so it doesn't seem like it needs to exist as a separate setting. I think it used to do more, for example, prior to #12790 it used was used in some JS library code. Another PR that removed a lot of usage of this settings was #12057. Simply removing it simplifies another change I'm working on: #18905. Also, the docs were incorrect in stating it was ignored except under MINIMAL_RUNTIME, but we had no testing for it.
All this setting was doing was setting MALLOC=none so it doesn't seem like it needs to exist as a separate setting. I think it used to do more, for example, prior to #12790 it used was used in some JS library code. Another PR that removed a lot of usage of this settings was #12057. Simply removing it simplifies another change I'm working on: #18905. Also, the docs were incorrect in stating it was ignored except under MINIMAL_RUNTIME, but we had no testing for it.
All this setting was doing was setting MALLOC=none so it doesn't seem like it needs to exist as a separate setting. I think it used to do more, for example, prior to #12790 it used was used in some JS library code. Another PR that removed a lot of usage of this settings was #12057. Simply removing it simplifies another change I'm working on: #18905. Also, the docs were incorrect in stating it was ignored except under MINIMAL_RUNTIME, but we had no testing for it.
DYNAMICTOP_PTR
has been a pointer to whereDYNAMICTOP
is stored,which represented the top of "dynamic" memory (not static, and not stack; or
in other words, dynamic == managed by sbrk/malloc). We represented it on the
JS side so that we could allocate from JS directly, in particular during startup.
However, that has caused a bunch of complexity that is not really worth it, and
it's also work done after link that we'd like to remove (WebAssembly/binaryen#3043).
This PR moves us to a place where malloc-like allocation is disallowed from
JS during startup. Instead, JS should call sbrk or malloc normally (which can
only be done after startup).
This change removes the final static allocation from JS, which means that we
no longer adjust memory layout after link, and therefore this PR removes the
special updating of the stack pointer and sbrk location that we used to do.
(This just removes the
--pass-arg
stuff for those inemcc.py
, to show thatthis works and passes tests - there is a bunch more code that can be removed
but is deferred to keep this as small as possible.)
Dynamic linking, however is an exception: we need to allocate room for
dynamic libraries during startup (the main module needs them before it
starts to run). To support that, this PR still keeps around
getMemory()
(which does a dynamic allocation during startup) and
dynamicAlloc
ina simplified form. That form just updates
__heap_base
as we allocate,and then when the main program starts,
__heap_base
is a normalextern global that it receives, and it initializes sbrk using that (after
that point, dynamic allocations are disallowed and asserted against).
This change should not be user-visible, except for removing the
allocate()
JS function's
ALLOC_DYNAMIC
option (since we no longer allow that typeof allocation).
This can only land when it removes the last static allocation from JS,
which means it is blocked by #12042 and #12055.
This has a small code size benefit, basically it removes things like
these:
The benefit to pthreads builds is larger as it removes a bunch of special
code for
DYNAMICTOP_PTR
that we had there.Note that I'm not totally happy with the sbrk implementation, but I
tried some other approaches and they were worse for code size. But
ideas welcome.