Refactor futures executor #168

Diggsey · 2018-03-22T02:27:50Z

(View individual commits for useful diff)

With the recent performance improvements, some unnecessary complexity was introduced:

There's no need for tasks to carry around a reference to their executor, since there is a single global executor.
The executor does not need to keep track of whether it is currently running.
The "inner" state of the executor does not need to be reference counted.
The names of various things (PromiseExecutor, Notifier, etc.) have become out of date. (eg. this is no longer an executor purely based around promises.)

With this complexity removed, I was also able to remove some redundant clones, and added some comments (eg. why we're using MutationObserver).

cc @Pauan, you may wish to verify that this does not regress performance in any way.

Also, thoughts on upgrading to futures 0.2.0-beta? Upgrading would allow us to remove the unsafe code here: the new version provides a safe implementation of Notify (now called Wake) for reference counted types, so we don't have to.

koute · 2018-03-22T19:40:48Z

Thanks a lot for the PR!

@Pauan Since you're the one who's the most familiar with this code I'd be great if you could take a look at this and review it.

Also, thoughts on upgrading to futures 0.2.0-beta?

Yes! I would very much like to migrate to it as soon as possible. (Hopefully 0.2.0 won't be that much different compared to 0.2.0-beta?) If you (or anyone else) would want to work on it I would be very happy to accept a PR.

koute · 2018-03-22T19:58:32Z

Also, since I've merged @Pauan's other PR this started to have merge conflicts, so you'll have to resolve the them. Sorry. (:

Pauan

There's no need for tasks to carry around a reference to their executor, since there is a single global executor. [...] The "inner" state of the executor does not need to be reference counted.

When wasm gets multiple threads, your implementation will break. It will compile correctly, and it (probably) won't have any runtime errors, but it will silently have different behavior.

I would rather have an Executor that works correctly across multiple threads. That makes it more useful when wasm gets multiple threads, and it's also useful for porting our Executor implementation to other hosts (the only JS/wasm specific thing is the waker function).

Since the Rc is only created once, the overhead for making it thread-safe is extremely small.

The executor does not need to keep track of whether it is currently running.

That's a good change.

The names of various things (PromiseExecutor, Notifier, etc.) have become out of date. (eg. this is no longer an executor purely based around promises.)

I agree.

Pauan · 2018-03-24T18:34:28Z

src/webcore/executor.rs

+// A proxy for the javascript event loop.
+struct EventLoop;
+
+// There's only one thread, but this lets us tell the compiler that we


This note doesn't seem correct. thread_local has nothing to do with "telling the compiler that we don't need a Sync bound".

Instead, it doesn't need a Sync bound because it will actually create a new value on each thread, so if there's 5 threads it might create 5 EventLoopInner

In the absence of multiple threads, a "thread-local" is a way of telling the compiler that globals do not need to be thread-safe. (ie. not Sync) whilst also achieving lazy initialisation, so I think the comment is pretty accurate.

Pauan · 2018-03-24T18:35:43Z

src/webcore/executor.rs

+    }
+}
+
+// State relating to the javascript event loop. Only one instance ever exists.


When wasm gets multiple threads this will no longer be true.

When wasm gets multiple threads we can re-evaluate the implementation.

Pauan · 2018-03-24T18:36:19Z

src/webcore/executor.rs

+}
+
+impl EventLoopInner {
+    // Initializes the event loop. Only called once.


Also not true when wasm gets multiple threads.

Pauan · 2018-03-24T18:37:16Z

src/webcore/executor.rs

+                    var node = document.createTextNode( "0" );
+                    var state = false;
+
+                    new MutationObserver( wrapper ).observe( node, { characterData: true } );


I know the original code doesn't clean this up, but it really should clean up the MutationObserver when the EventLoopInner is dropped.

This will happen automatically - a MutationObserver is garbage collected when its target node is collected. The target node will be garbage collected when the waker is dropped. The waker will be dropped when the EventLoopInner is dropped.

Pauan · 2018-03-24T18:38:02Z

src/webcore/executor.rs

+                } else {
+                    var promise = Promise.resolve( null );
+
+                    function nextTick( value ) {


This shouldn't take a value argument.

Pauan · 2018-03-24T18:40:21Z

src/webcore/executor.rs

+        self.microtask_queue.borrow_mut().pop_front()
+    }
+    // Poll the queue until it is empty
+    fn drain(&self) {


This should clean up the memory for the queue. Otherwise it will waste memory if a large number of Tasks are queued (since it will grow the memory, but it will never deallocate the memory).

I had an idea of only cleaning up the queue memory if a certain amount of time has passed (e.g. 5 seconds).

Interesting, but I think that can wait for a second PR. I'd prefer not to tie it to wall-clock time, perhaps "shrinking the queue if the maximum fill ratio from the last N iterations drops below 50%" would be a good metric.

In that case why not just add back in my implementation of always cleaning the queue? That's simple to add and we can always refine it later.

Pauan · 2018-03-24T18:44:46Z

src/webcore/executor.rs

+                    var promise = Promise.resolve( null );
+
+                    function nextTick( value ) {
+                        promise.then( wrapper );


When the EventLoopInner is dropped while the Promise is pending, it will cause a runtime panic. So in that situation it shouldn't call the callback

Right now the EventLoopInner is never dropped (since the main thread never exits) but I will adjust this anyway.

Pauan · 2018-03-24T18:45:21Z

src/webcore/executor.rs

+    static EVENT_LOOP_INNER: EventLoopInner = EventLoopInner::new();
+}
+
+impl EventLoop {


These functions should probably be marked as #[inline]

Pauan · 2018-03-24T18:45:50Z

src/webcore/executor.rs

+        }
+    }
+
+    fn notify( task: Rc< SpawnedTask > ) {


This should probably be marked #[inline]

Until we find a performance issue due to poor codegen, I'd rather let the compiler decide whether to inline it.

Pauan · 2018-03-24T18:46:07Z

src/webcore/executor.rs

+    }
+}
+
+// A proxy for the javascript event loop.


Nit: javascript -> JavaScript

Diggsey · 2018-03-24T19:54:30Z

@Pauan most of your complaints seem to be around multiple threads: wasm doesn't have multiple threads, and it's not clear yet that it will ever get anything beyond web-workers (this PR is compatible with web-workers) so I don't see any reason to take that into consideration - the implementation it replaces was not compatible with threads anyway.

…ures-executor

Pauan · 2018-03-24T21:04:33Z

@Diggsey That's not true. With WebWorkers it's possible to share memory between WebWorkers, so the same multi-thread data race issues happen (and thus the usual Rust thread-safe abstractions need to be used).

In fact wasm already had multi-thread support until SharedArrayBuffer was temporarily disabled.

Let's suppose that WebAssembly does get threads. If your implementation caused compile-time or run-time errors, then I would be perfectly fine with assuming a single-threaded host.

But that's not the case. Because your implementation will cause silent and subtle breakage, it can be quite easy for us to not notice and forget to add in proper thread-safe support later.

Because we will need to support multi-threading at some point in the future, why not implement it now and avoid the future problems? The performance cost is incredibly negligible.

the implementation it replaces was not compatible with threads anyway.

Then how can it be made thread-safe? By changing SpawnedTask to use Arc? Or something deeper than that?

Diggsey · 2018-03-24T21:27:31Z

Then how can it be made thread-safe? By changing SpawnedTask to use Arc? Or something deeper than that?

It can't be, the rust stdlib is inherently single-threaded right now - come back to this when rust supports multiple threads.

Let's suppose that WebAssembly does get threads. If your implementation caused compile-time or run-time errors, then I would be perfectly fine with assuming a single-threaded host.

There's no way to do that - the current rust (and possible llvm) implementation will already break horribly and silently if you try to use SharedArrayBuffer as your memory and then try to use it from multiple threads. The only guarantee we can provide is that there's no way to cause that breakage from the rust side.

Pauan · 2018-03-24T21:31:02Z

@Diggsey Yes... I know that the wasm implementation of Rust is single-threaded right now.

I'm saying that will change in the future, and probably sooner than you think. So I want to make sure that our implementation is up to snuff, rather than having silent breakage later.

Diggsey · 2018-03-24T21:34:57Z

I'm saying that will change in the future, and probably sooner than you think. So I want to make sure that our implementation is up to snuff, rather than having silent breakage later.

And I'm saying doing that now is a bad idea when there's no way to actually test our implementation, and we still don't know for sure what the constraints of that multi-threading model will be. When we have a nightly rust that supports multi-threaded wasm and an environment to run it in, then we can start thinking about it.

Pauan · 2018-03-24T21:37:42Z

You're right that there's no way to test the implementation right now, but we can base the implementation on existing things like the current_thread executor. That should ensure that it will work correctly.

Diggsey · 2018-03-24T21:52:54Z

The current_thread executor is quite different from this executor: it doesn't need to rely on thread-local storage because it can be created on the stack. It's also possible (although not advisable) to nest one current_thread executor within another. Finally, the wake-up mechanism can be implemented on the rust side.

The javascript executor is only superficially similar: we have to rely on thread-local storage, because a new javascript executor is automatically created for every thread, and we have no way to create our own executors. We have no control over the wake-up mechanism, it must go through an existing javascript mechanism.

The wake-up mechanism is particularly difficult: if we have a handle to a task running on another thread, we cannot use promises or DOM observers to wake up that thread. We must use an API such as Worker.postMessage - but we only want to fall back to that mechanism if it really is running on a different thread. Also, this means attaching a message listener on every thread, which must have some means to filter out messages meant for the executor vs those in use by the application. We also have to consider that the rust standard library will also have installed its own custom listeners and we must not conflict with that.

Now can you see why I have no desire to get into this complexity until we have a working foundation, and even attempting to do so now is a futile effort?

Pauan · 2018-03-24T21:55:29Z

Fine, at least put in a big huge TODO that makes it clear that this is not thread-safe and that it will need to be fixed later.

Diggsey · 2018-03-24T23:43:38Z

I added a file-level comment describing the executor and its limitations, and also implemented queue shrinking if the queue is under 50% utilised for 25 consecutive iterations.

Pauan · 2018-03-25T16:29:58Z

src/webcore/executor.rs

@@ -0,0 +1,226 @@
+// This file implements a futures-compatible executor which schedules futures


Add in TODO for easier greppability.

Pauan · 2018-03-25T16:52:24Z

src/webcore/executor.rs

+        Rc::new( Self {
+            is_queued: Cell::new( false ),
+            spawn: RefCell::new( Some( executor::spawn(
+                Box::new( future ) as BoxedFuture


I haven't tested it, but is it actually needed to convert this into a Box? Looking at the docs for spawn, it doesn't seem to require a Box

spawn<T>(future: T) returns a Spawn<T> - ie. it does not perform type-erasure. The box is needed to erase the type.

Pauan · 2018-03-25T17:20:46Z

src/webcore/executor.rs

+
+                    new MutationObserver( wrapper ).observe( node, { characterData: true } );
+
+                    function nextTick() {


Function declarations are block scoped, so this won't work. So either it needs to move the return nextTick; inside of the branches, or it needs to use var nextTick = ...

Actually they're not - function declarations are hoisted to the enclosing function/global scope. However, it looks like Edge does not implement this correctly so I will change this.

That is incorrect. Function declarations in blocks have always been invalid according to the ECMAScript spec. Browsers implemented it anyways, but with inconsistent behavior. Function declarations are always block scoped in strict mode, in all browsers.

Pauan · 2018-03-25T17:27:13Z

src/webcore/executor.rs

+    fn pop_task(&self) -> Option< Rc< SpawnedTask > > {
+        self.microtask_queue.borrow_mut().pop_front()
+    }
+    fn shrink_if_necessary(&self) {


It should have a blank line above and below this function.

Pauan · 2018-03-25T17:29:40Z

src/webcore/executor.rs

+            // `QUEUE_SHRINK_DELAY` iterations.
+            let shrink_counter = self.shrink_counter.get();
+            if shrink_counter < QUEUE_SHRINK_DELAY {
+                self.shrink_counter.set(shrink_counter+1);


It should have spaces around +

Pauan · 2018-03-25T17:30:00Z

src/webcore/executor.rs

+            if shrink_counter < QUEUE_SHRINK_DELAY {
+                self.shrink_counter.set(shrink_counter+1);
+            } else {
+                queue.shrink_to_fit();


It shouldn't use shrink_to_fit, it should replace the queue with a new VecDeque with INITIAL_QUEUE_CAPACITY

No it shouldn't, that will result in unnecessary allocations.

shrink_to_fit also deallocates + allocates, but less efficiently than recreating the VecDeque.

In addition, shrink_to_fit will cause it to have 0 capacity, but we want it to have INITIAL_QUEUE_CAPACITY capacity.

No, we want it to have a capacity equal to the last queue size, since that is a good estimate for future iterations. We could avoid some overhead by waiting to resize it until after draining the items, but this is a very cold code-path (it's only hit 25 iterations after sudden drops in the queue size).

Ahh, I see, you moved the call to shrink_if_necessary

I still don't like that it might have a capacity lower than INITIAL_QUEUE_CAPACITY

There's actually not a good way to control the capacity this way. I've implemented the closest approximation instead, and opened rust-lang/rust#49385.

Sure there is... have the shrink_if_necessary method return an Option<usize>, and then after draining it will check if it is Some, and if so it will replace the old VecDeque with a new VecDeque with the appropriate capacity.

Yeah I didn't really want to leak this complexity into the drain function, but I've implemented that now.

Pauan · 2018-03-25T17:35:21Z

src/webcore/executor.rs

+        let mut queue = self.microtask_queue.borrow_mut();
+        // We consider shrinking the queue if it is less than
+        // half full...
+        if queue.len() <= queue.capacity() / 2 {


This is wrong. The len will always be 0, because it's called after draining. It should instead shrink it if the capacity is larger than INITIAL_QUEUE_CAPACITY

You're right about it reading the length at the wrong point, however using INITIAL_QUEUE_CAPACITY here will shrink the queue even if it is being fully utilised, which is wrong.

Pauan · 2018-03-25T17:46:06Z

src/webcore/executor.rs

+// Initial capacity of the event queue
+const INITIAL_QUEUE_CAPACITY: usize = 10;
+// Iterations to wait before allowing the queue to shrink
+const QUEUE_SHRINK_DELAY: usize = 25;


There should be a TODO which says that we should test for the optimal sizes for these two constants.

Pauan · 2018-03-26T21:43:40Z

src/webcore/executor.rs

@@ -178,6 +178,10 @@ impl EventLoopInner {
    // Reclaim space from the queue if it's going to waste
    fn shrink_if_necessary(&self) {
        let mut queue = self.microtask_queue.borrow_mut();
+        if queue.capacity() <= INITIAL_QUEUE_CAPACITY {


This doesn't seem right. Let's say the queue has a capacity of 50 but a length of 1. This check will return false, but then later it calls shrink_to_fit which resizes it to a capacity of 1

Pauan

Okay, great, it looks good. I just need to test that it works correctly and doesn't regress performance, and then I'll approve it.

Pauan · 2018-03-26T23:42:27Z

src/webcore/executor.rs

+        let cap = queue.capacity();
+        // We consider shrinking the queue if it is less than
+        // half full...
+        if cap > queue.len()*2 && cap > INITIAL_QUEUE_CAPACITY {


This doesn't block my approval, however it is possible for the capacity of the VecDeque to be higher than INITIAL_QUEUE_CAPACITY.

That means it will reallocate every 25 iterations, even if the queue never goes above INITIAL_QUEUE_CAPACITY elements.

Either this should be fixed, or a TODO note should be added.

And... this is why more control is needed over the capacity 😛 With a shrink_to method it wouldn't matter.

I fixed the issue by allowing it up to 2*INITIAL_QUEUE_CAPACITY without trying to shrink (VecDeque uses powers of two).

More control might not help. The if branch conditional will still be taken, with or without a shrink_to method. So you're hoping that the VecDeque::shrink_to implementation always gives exactly the same result as VecDeque::with_capacity, which isn't guaranteed.

It doesn't matter what shrink_to gives - as long as it doesn't reallocate if the new capacity is no different from the old one.

Also, if it were to produce 16, then nothing would happen since the capacity would already be smaller than that.

I do not want performance to mysteriously change based upon the implementation details of VecDeque changing.

The implementation details of VecDeque are not going to change to waste more than a factor of 2 in space, and this implementation is compatible with any other change.

so let's use that instead.

No, there's no reason to introduce another variable and code to track that to handle a situation that's not going to happen.

Source please? My understanding is that the capacity is not guaranteed in the slightest, and it could change to use any strategy it wishes, including choosing the capacity dynamically.

Yes it's unlikely that it will change, but that's not the same as it never changing.

My understanding is that the capacity is not guaranteed in the slightest, and it could change to use any strategy it wishes, including choosing the capacity dynamically.

In which case your solution is just as broken.

As far as re-allocation strategies go, it's well known that sub-power-of-2 allocation ratios are preferred, as they reduce fragmentation when re-allocating. The main reason power of 2 is used here is because it uses bit-masking to wrap indexes, and computers use base 2.

Nothing is certain, but given those facts, I can be more confident that the allocation ratio is not going to increase than I can be in any of the other myriad of assumptions underpinning the very existence of this crate. Even if it does increase, it's not going to break anything, so the impact/probability ratio is so low that this entire conversation is irrelevant.

You simply cannot performance tune without relying on implementation details, and attempting to do so is a waste of time, and is the definition of premature optimisation.

No, my solution isn't broken. If you create a VecDeque with capacity 10, but it's actually capacity 15, then we know if it hasn't gone above capacity 15 then it doesn't need to be reallocated, because Rust does guarantee that it will only reallocate if the number of elements goes above the current capacity. So it doesn't rely upon implementation details at all.

Pauan

Okay, I tested it, the performance seems on par, so this is okay with me.

koute · 2018-03-27T20:37:01Z

So can this be considered ready?

Diggsey · 2018-03-27T20:48:36Z

Yes

koute · 2018-03-28T21:34:14Z

Looks good to me! Thanks a lot!

Diggsey added 2 commits March 22, 2018 02:10

Refactor promise executor

fc3882d

Rename promise_executor => executor

b84b0db

Merge branch 'master' into futures-executor

3aa7785

Pauan suggested changes Mar 24, 2018

View reviewed changes

Diggsey added 2 commits March 24, 2018 20:34

Review changes

3429167

Merge branch 'futures-executor' of github.com:Diggsey/stdweb into fut…

26b6d1d

…ures-executor

Implement queue shrinking

3cc0221

Pauan suggested changes Mar 25, 2018

View reviewed changes

Diggsey force-pushed the futures-executor branch from e0e2044 to c7aa2c3 Compare March 25, 2018 18:31

Review changes

0ce735a

Diggsey force-pushed the futures-executor branch from c7aa2c3 to 0ce735a Compare March 25, 2018 18:33

Don't resize queues smaller than the initial capacity

542d2f5

Pauan reviewed Mar 26, 2018

View reviewed changes

Review changes

288cacc

Pauan reviewed Mar 26, 2018

View reviewed changes

Half the capacity before comparing it

0d28258

Pauan approved these changes Mar 27, 2018

View reviewed changes

koute merged commit fed444b into koute:master Mar 28, 2018

		@@ -0,0 +1,226 @@
		// This file implements a futures-compatible executor which schedules futures


		new MutationObserver( wrapper ).observe( node, { characterData: true } );

		function nextTick() {

Refactor futures executor #168

Refactor futures executor #168

Uh oh!

Conversation

Diggsey commented Mar 22, 2018

Uh oh!

koute commented Mar 22, 2018

Uh oh!

koute commented Mar 22, 2018

Uh oh!

Pauan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Diggsey commented Mar 24, 2018

Uh oh!

Pauan commented Mar 24, 2018

Uh oh!

Diggsey commented Mar 24, 2018

Uh oh!

Pauan commented Mar 24, 2018

Uh oh!

Diggsey commented Mar 24, 2018

Uh oh!

Pauan commented Mar 24, 2018

Uh oh!

Diggsey commented Mar 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pauan commented Mar 24, 2018

Uh oh!

Diggsey commented Mar 24, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Diggsey commented Mar 24, 2018 •

edited

Loading

Pauan Mar 25, 2018 •

edited

Loading