Skip to content

Conversation

chouquette
Copy link
Contributor

Hi,

This MR adds support for handling custom messages in pthread based workers, similar to https://emscripten.org/docs/api_reference/module.html#Module.onCustomMessage but even when building without PROXY_TO_WORKER

@kripken
Copy link
Member

kripken commented Feb 14, 2022

I think this might be reasonable to add, especially since we've had something similar in another mode. @sbc100 what do you think?

If we decide to go with this, please update the docs under site/ in the location that you linked to in module.html.

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea in general! I wonder about the naming though. Is onCustomMessage the best name? Do we have existing present for using this naming convention? I don't know that I have any great alternatives... maybe onPostMessageor onUserMessage (like the void* user_data from the C world)?

if (Module['onCustomMessage']) {
Module['onCustomMessage'](d);
} else {
throw 'Custom message received but worker Module.onCustomMessage not implemented.';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

#if ASSERTIONS
assert(Module['onCustomMessage'], 'Custom message received but worker Module.onCustomMessage not defined.').
#endif
Module['onCustomMessage'](d);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done for both occurences

</script>
{{{ SCRIPT }}}
</body>
</html>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should need to add this html file .. I think we can just set Module.onCustomMessage in a pre-js, no? (that way the test will run on node too).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I got that working for a browser test, but I suppose it shouldn't be a browser test if it's to be ran with node?

Is core OK or should it be a 'other' test? (Or something else entirely?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced the shell file with a --pre-js for now, let me know if I should move the test to another suite


EM_JS(void, run_test, (), {
function sendMessageToMainThread(cmd, payload) {
self.postMessage({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is self here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main worker (I believe?), but it appears to be useless, so let's remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@chouquette chouquette force-pushed the add_worker_on_custom_message branch from 33c5e99 to e8cbca2 Compare February 18, 2022 10:28
@chouquette
Copy link
Contributor Author

I wonder about the naming though

I'm fine with onUserMessage :) The main reason I went with onCustomMessage was that it was already existing but only for some configurations, and I didn't see a good reason not to keep the same name.

@chouquette
Copy link
Contributor Author

chouquette commented Mar 2, 2022

Hi,

Gentle ping on this PR, please let me know if you want the naming, or anything else, to change :)

@@ -274,6 +274,11 @@ var LibraryPThread = {
if (Module['onAbort']) {
Module['onAbort'](d['arg']);
}
} else if (cmd === 'custom') {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you wrap this new block in #if expectToReceiveOnModule('onCustomMessage')

@@ -293,6 +293,11 @@ self.onmessage = (e) => {
if (Module['_pthread_self']()) { // If this thread is actually running?
Module['_emscripten_proxy_execute_queue'](e.data.queue);
}
} else if (e.data.cmd === 'custom') {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this one.

@@ -4263,6 +4263,10 @@ def test_canvas_size_proxy(self):
def test_custom_messages_proxy(self):
self.btest(test_file('custom_messages_proxy.c'), expected='1', args=['--proxy-to-worker', '--shell-file', test_file('custom_messages_proxy_shell.html'), '--post-js', test_file('custom_messages_proxy_postjs.js')])

@requires_threads
def test_custom_message_worker(self):
self.btest(test_file('custom_messages_worker.c'), expected='1', args=['-sUSE_PTHREADS', '-sPTHREAD_POOL_SIZE=2', '--pre-js', test_file('custom_messages_worker_pre.js')])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need a pthread pool here do you? Can you remove the PTHREAD_POOL_SIZE setting?

Can you use btest_exit here rather than btest (and remove the expected argument which will default to 0).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also run this test under node in test_other.py?

@@ -4263,6 +4263,10 @@ def test_canvas_size_proxy(self):
def test_custom_messages_proxy(self):
self.btest(test_file('custom_messages_proxy.c'), expected='1', args=['--proxy-to-worker', '--shell-file', test_file('custom_messages_proxy_shell.html'), '--post-js', test_file('custom_messages_proxy_postjs.js')])

@requires_threads
def test_custom_message_worker(self):
self.btest(test_file('custom_messages_worker.c'), expected='1', args=['-sUSE_PTHREADS', '-sPTHREAD_POOL_SIZE=2', '--pre-js', test_file('custom_messages_worker_pre.js')])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call this custom_message_pthread? Does that better describe what its testing?

@@ -163,3 +163,5 @@ Other methods

When compiled with ``PROXY_TO_WORKER = 1`` (see `settings.js <https://github.com/emscripten-core/emscripten/blob/main/src/settings.js>`_), this callback (which should be implemented on both the client and worker's ``Module`` object) allows sending custom messages and data between the web worker and the main thread (using the ``postCustomMessage`` function defined in `proxyClient.js <https://github.com/emscripten-core/emscripten/blob/main/src/proxyClient.js>`_ and `proxyWorker.js <https://github.com/emscripten-core/emscripten/blob/main/src/proxyWorker.js>`_).

When compiled with ``USE_PTHREADS = 1`` (see `settings.js <https://github.com/emscripten-core/emscripten/blob/main/src/settings.js>`_), this callback will be invoked when a message containing the command ``custom`` is received. It allows to send messages back and forth between workers and the main thread using the ``Worker.postMessage`` function.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop the = 1 here.. its not needed.

@juj
Copy link
Collaborator

juj commented Mar 4, 2022

Apologies, but I feel strongly against merging this.

The issue here is that layering new functionality on Module does not scale well, and does not DCE at all. There is a lot of bad history in Emscripten design from its early days that used Module as a general hub for sharing information across random places, and that is what has lead a lot of people vocally complain that Emscripten is complex and bloated.

There are a number of blog posts that have found themselves wanting to ridicule Emscripten for the fact that the tiniest "hello world" printf apps produce large output code sizes, so there has been a lot of work going in to ensure that what people perceive as bloat is being actively reduced.

Because of that, I don't think we should have this kind of addition merged in, since it increases code size for all pthreads users. While the code size increase is "just linear" in the number of bytes added, the cognitive load to read the build output increases superlinearly really fast.

I would recommend instead adopting an approach of using existing library functions. We already have a number of different APIs for proxying and sending messages between Workers, couldn't one of those be used instead?

@juj
Copy link
Collaborator

juj commented Mar 4, 2022

As for the existing Module.onCustomMessage API - that would be good to go the route of deprecation in the future, for the same reasons.

@sbc100
Copy link
Collaborator

sbc100 commented Mar 4, 2022

We recently developed an approach that allows use to extend/use the incoming module API in an opt-in way that doesn't bloat the code for users unless they explicitly opt into it.

The technique was enabled by this change: #16346

And first used here: #16361

This means that we should never need to increase the default ALL_INCOMING_MODULE_JS_API .. and in fact we can probably shrink it over time to reduce the default code size.

Given that we have this opt-in mechanism now, I think these kind of changes are a lot more acceptable.

We can have a separate debate about "should we allow users to hook directly into the postMessage loop".. but there is no (default) code size bloat associated with this change if we decide the answer is yes.

@sbc100
Copy link
Collaborator

sbc100 commented Mar 4, 2022

BTW, I totally agree that these kinds of changes are no acceptable if they increase the code size by default.

@juj
Copy link
Collaborator

juj commented Mar 4, 2022

We can have a separate debate about "should we allow users to hook directly into the postMessage loop"

It should always have been the case that people can directly inject their own postMessage events. All the message event handlers that Emscripten Worker-based APIs have should use their own dedicated message detection mechanism to play nice with custom user submitted events. I think this is still true with all the APIs.

One thing in particular is that Emscripten should not be assigning worker.onmessage or self.onmessage, but instead does .addEventListener('message', ...) so that if the user has existing JS code that does expect to own the .onmessage variable, it can do so without issues.

Restricting users from being able to submit custom postMessages would be limiting from site extensibility viewpoint.

We recently developed an approach that allows use to extend/use the incoming module API in an opt-in way that doesn't bloat the code for users unless they explicitly opt into it.

I do recall that, and I don't think it is the best solution tbh. It fixes complexity by adding more complexity. While it does fix the final build output in terms of code size, it does so by making Emscripten harder to use (a new INCOMING_MODULE_JS_API setting to have to worry about), and the source files (library_pthread.js, worker.js) still have the code complexity.

(Though now that I read this, I think the code size here does grow, and it is not adhering to the setting in INCOMING_MODULE_JS_API)

Note that I hope I am not setting up a double standard: I do also leverage custom -s settings like this, e.g. the upcoming WASM_WORKERS_NO_TLS whenever I need to add things that don't DCE well otherwise.

However I think the critical difference here is that such settings should be introduced only when we realize there is no other way to get to emit the code/feature otherwise. If that is the case, then I think the complexity is warranted. However in this case I think this use case can be solved with existing JS and C/C++ library functions without needing to add non-DCEing functionality?

(Or if not, my apologies, but in that case, I hope we can look a bit more in detail about the specific use case to see why the existing message passing library functions will not cut it)

@sbc100
Copy link
Collaborator

sbc100 commented Mar 4, 2022

(Though now that I read this, I think the code size here does grow, and it is not adhering to the setting in INCOMING_MODULE_JS_API)

See #16239 (comment). I was not planning on having this land without that change.

@sbc100
Copy link
Collaborator

sbc100 commented Mar 4, 2022

Regarding the issue at hand, the ability to receive custom messages on a worker, I didn't know about addEventListener('message', ...). If that works, it could indeed mean that this change is not needed. We should add test to ensure it does.

@chouquette
Copy link
Contributor Author

Hi and sorry about the bit of delay.

Indeed the addEventListener('message', ...) way is working, and should be the correct one since it requires less intrusive changes, however the problem with that approach is that the main message handler will trigger the

        else {
          err("worker sent an unknown command " + cmd);
        }

path. I'm not entirely at ease with removing the error in case of an unknown message, and moving the error in a build setting dependent block doesn't seem to user friendly.

I'm unsure what's the way to go from here, but it seems that this MR should be closed as most of its code will be removed anyway

@juj
Copy link
Collaborator

juj commented Mar 8, 2022

Indeed the addEventListener('message', ...) way is working

Great, that's good to hear!

however the problem with that approach is that the main message handler will trigger the

Oops, that looks like a bug.. the error message should only trigger if receiving a message that looks like it should be handled by the library_pthread.js message listener. Posted #16450 to fix that. Does that help?

@sbc100
Copy link
Collaborator

sbc100 commented Mar 8, 2022

However I think the critical difference here is that such settings should be introduced only when we realize there is no other way to get to emit the code/feature otherwise. If that is the case, then I think the complexity is warranted. However in this case I think this use case can be solved with existing JS and C/C++ library functions without needing to add non-DCEing functionality?

I agree that using INCOMING_MODULE_JS_API should be a last resort. If there is a better/easier way to inject the customization I'm all for it.

How do you envisage a use calling .addEventListener('message', ...) , though? Do we want to recommend that folks use the mappings in libray_pthread.js to look up and manipulate the worker objects that back the pthreads? I was hoping we could consider those details internal. Perhaps we should have supported API for getting access the worker that is running a given pthread?

@chouquette
Copy link
Contributor Author

chouquette commented Mar 9, 2022

Posted #16450 to fix that. Does that help?

Apparently yes! Thanks

However, I spoke too soon when I said that addEventListener is working. I can add some additional listeners from pthread, but I failed to add an extra listener for the main thread.

Using postMessage from a pthread invokes the handler defined in library_pthread.js correctly, but I didn't manage to invoke any custom handler. My understanding of JavaScript might be the issue here though 😅

AFAIU I should add the event handler to the Worker instance that represents the main thread but I'm a bit confused there, in my case the main thread isn't supposed to be a pthread (I don't build with PROXY_TO_WORKER), yet messages sent from a worker/pthread appear to be received in the main thread (I can list all other running, which if I understood correctly denotes that the code is running in the main thread)

If I add a listener through the window object, it doesn't receive any messages sent by workers. I'm not sure what I'm missing but I could definitely use some help. (in the event this would be easier in a real time conversation I'm present on your discord server using the same nick)

@sbc100
Copy link
Collaborator

sbc100 commented Mar 9, 2022

Posted #16450 to fix that. Does that help?

Apparently yes! Thanks

However, I spoke too soon when I said that addEventListener is worker. I can add some additional listeners from pthread, but I failed to add an extra listener for the main thread.

Using postMessage from a pthread invokes the handler defined in library_pthread.js correctly, but I didn't manage to invoke any custom handler. My understanding of JavaScript might be the issue here though sweat_smile

AFAIU I should add the event handler to the Worker instance that represents the main thread but I'm a bit confused there, in my case the main thread isn't supposed to be a pthread (I don't build with PROXY_TO_WORKER), yet messages sent from a worker/pthread appear to be received in the main thread (I can list all other running, which if I understood correctly denotes that the code is running in the main thread)

If I add a listener through the window object, it doesn't receive any messages sent by workers. I'm not sure what I'm missing but I could definitely use some help. (in the event this would be easier in a real time conversation I'm present on your discord server using the same nick)

I think you would need to somehow attach you even handler to each new worker object that gets created. These event listeners handlers are added in library_pthread.js. SeeloadWasmModuleToWorker. The question I have is how best to inject your extra handler .. or add to to workers as they are created.

@chouquette
Copy link
Contributor Author

I think you would need to somehow attach you even handler to each new worker object that gets created.

This should work indeed, but it would still require the user to be able to inject their handler into emscripten somehow no? I was hopping to achieve something less intrusive through addEventListener, ideally without modifying emscripten.

To put it another way, I don't really see the difference between the original attempt in this MR and exposing another event handler to all workers. I'll do another pass at the previous comments with a rested head tomorrow as I might have missed something

@sbc100
Copy link
Collaborator

sbc100 commented Mar 9, 2022

I think you would need to somehow attach you even handler to each new worker object that gets created.

This should work indeed, but it would still require the user to be able to inject their handler into emscripten somehow no? I was hopping to achieve something less intrusive through addEventListener, ideally without modifying emscripten.

Yes, according to the discussion happening over on #16450 it should be possible for you to call addEventListener on new workers as they are created. As of today I think you would need to do something like PThread.pthreads[pthread_ptr].worker to get access to the worker of a given thread.. with that you should be able to do addEventListener?

@chouquette
Copy link
Contributor Author

That was my initial attempt, but so far Module.PThread.pthreads[Module._pthread_self()]; yields undefined from the main thread.

From other threads that's not an issue though

@sbc100
Copy link
Collaborator

sbc100 commented Mar 9, 2022

That was my initial attempt, but so far Module.PThread.pthreads[Module._pthread_self()]; yields undefined from the main thread.

Yes that is expected, the handler would only need to be installed from the main thread and on the workers it owns.

@chouquette
Copy link
Contributor Author

Doesn't that mean that it's not possible to add a message handler for the main thread?

To try and clarify, I'm trying to send a message from a worker to the main thread in order to transfer some objects.

The message is sent & correctly received by the worker.onmessage handler from the main thread, but any handler I register from the HTML page is not invoked, while I was able to do so using cmd: custom and registering a handler in Module['onCustomMessage']

Again, sorry if I'm missing something

@chouquette
Copy link
Contributor Author

Oh I think I'm starting to understand my confusion, feel free to ignore my last comment for the time being, sorry about that

@chouquette
Copy link
Contributor Author

chouquette commented Mar 11, 2022

I can now confirm that this patchset is unrequired.

For the sake of explaining my confusion, should it be useful to anyone else, my main mistake was to assume that worker.postMessage would cause the handler executed in the main thread but with a different worker object (ie. the worker that addEventListener would have been invoked on). This is why I was struggling to find a Worker instance that I could invoke addEventListener on.

However in reality, the event handler in invoked on the same worker object, but from a different thread. Meaning that when the handler is invoked, it can access things that are only accessible from the main thread, so it's fairly easy to find the correct worker object knowing the thread ID

TL;DR I can do what I want with this kind of code:

    MAIN_THREAD_EM_ASM({
        Module.PThread.pthreads[$0].worker.addEventListener('message', function (e) {
// handle the event in the main thread context
        });
    }, pthread_self());

and later on invoke the handler through the usual postMessage

I hope this makes sense and can help someone struggling with starting with emscripten as I am 😅

Thanks a lot for your help and time with this! I'll now close the MR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants