AsyncOnSubscribe strange behavior #3341

akarnokd · 2015-09-12T09:01:36Z

I've been asked to show test cases that prove the existence of bugs I mentioned in #3203.

First, I tried to show the race between inner termination and outer unsubscribtion, but the following simple code returns an error instead of any value:

@Test
public void bugRequestAlwaysMax() {
    Action2<Long, Observer<Observable<? extends Integer>>> action2 = new Action2<Long, Observer<Observable<? extends Integer>>>() {
        @Override
        public void call(Long t1, Observer<Observable<? extends Integer>> t2) {
            System.out.println(t1);
            int intValue = t1.intValue();
            t2.onNext(
                    Observable.range(1, intValue)
            );
        }
    };

    final TestSubscriber<Integer> ts = TestSubscriber.create(0);

    AsyncOnSubscribe.createStateless(action2).call(ts);

    ts.requestMore(1);

    ts.assertNoErrors();
    ts.assertValue(1);
}

java.lang.AssertionError: Unexpected onError events: 1
    at rx.observers.TestSubscriber.assertNoErrors(TestSubscriber.java:263)
    at rx.observables.AsyncOnSubscribeTest.bugUnsubscribeVsInnerComplete(AsyncOnSubscribeTest.java:443)
Caused by: java.lang.IllegalArgumentException: Count can not be negative
    at rx.Observable.range(Observable.java:2514)
    at rx.observables.AsyncOnSubscribeTest$27.call(AsyncOnSubscribeTest.java:425)
    at rx.observables.AsyncOnSubscribeTest$27.call(AsyncOnSubscribeTest.java:1)
    at rx.observables.AsyncOnSubscribe$3.call(AsyncOnSubscribe.java:223)
    at rx.observables.AsyncOnSubscribe$3.call(AsyncOnSubscribe.java:1)
    at rx.observables.AsyncOnSubscribe$AsyncOnSubscribeImpl.next(AsyncOnSubscribe.java:303)
    at rx.observables.AsyncOnSubscribe$AsyncOuterSubscriber.nextIteration(AsyncOnSubscribe.java:370)
    at rx.observables.AsyncOnSubscribe$AsyncOuterSubscriber.request(AsyncOnSubscribe.java:392)
    at rx.Subscriber.setProducer(Subscriber.java:209)
    at rx.observables.AsyncOnSubscribe.call(AsyncOnSubscribe.java:320)
    at rx.observables.AsyncOnSubscribe.call(AsyncOnSubscribe.java:1)
    at rx.observables.AsyncOnSubscribeTest.bugUnsubscribeVsInnerComplete(AsyncOnSubscribeTest.java:439)
    ... 26 more

It seems that AsyncOuterSubscriber always requests Long.MAX_VALUE even if the child requests nothing. This value ends up as -1 and hence the error.

I've debugged testOnUnsubscribeHasCorrectState and it seems there is an initial request of MAX_VALUE which triggers state 1. the value is also put into the queue which is polled on the next iteration yielding another MAX_VALUE and state 2. At this point, the main reaches requestMore(2) which triggers state 3 and requestMore(3) does nothing since the inner has called onComplete().

The following code hangs after a few dozen iterations. It tries to overlap the termination of the inner range with a concurrent cancellation.

ExecutorService exec = Executors.newSingleThreadExecutor();
try {
    for (int i = 0; i < 1000; i++) {
        if (i % 50 == 0) {
            System.out.println("bugUnsubscribeVsInnerComplete >> " + i);
        }
        final CyclicBarrier cb = new CyclicBarrier(2);
        Action2<Long, Observer<Observable<? extends Integer>>> action2 = new Action2<Long, Observer<Observable<? extends Integer>>>() {
            @Override
            public void call(Long t1, Observer<Observable<? extends Integer>> t2) {
                System.out.println(t1);
                t2.onNext(
                        Observable.range(1, 1)
                        .doOnSubscribe(new Action0() {
                            @Override
                            public void call() {
                                System.out.println("Subscribed to inner");
                            }
                        })
                        .doOnCompleted(new Action0() {
                            @Override
                            public void call() {
                                System.out.println("Entering completion handler");
                                await(cb);
                            }
                        })
                );
            }
        };

        final TestSubscriber<Integer> ts = TestSubscriber.create(0);

        Future<?> f = exec.submit(new Runnable() {
            @Override
            public void run() {
                System.out.println("Entering unsubscribe");
                await(cb);
                ts.unsubscribe();
            }
        });

        AsyncOnSubscribe.createStateless(action2).call(ts);


        f.get();

        ts.assertNoErrors();
    }
} finally {
    exec.shutdownNow();
}

I couldn't get further with the other bugs because the behavior of this two tests prevent any other progress.

Since I don't understand what the original intent was nor how the request-juggling supposed to work, I can't offer any suggestion on what to and how to fix the operator. Therefore, I suggest rolling back the merge and extending the unit test with more checks, including real concurrent ones (not just TestScheduler-based).

The text was updated successfully, but these errors were encountered:

stealthcode · 2015-09-14T23:50:39Z

@akarnokd Thanks for writing this up. I'll try to pick through it for problems. Regarding the first item (bugRequestAlwaysMax) the TestSubscriber's is actually calling request with Long.MAX_VALUE. The TestSubscriber.create(0) doesn't seem to be deferring the request until after subscription. This could be because the onStart() is not called. What do you think the appropriate fix should be?

stealthcode · 2015-09-15T00:01:06Z

Why should be rollback? The AsyncOnSubscribe is marked as @Experimental and we have not made a release yet. This functionality is independent and clearly should not be in use until these issues can be addressed.

akarnokd · 2015-09-15T07:21:40Z

Why should be rollback?

I'm afraid the problem with this component will either delay the release of 1.0.15 again and may degrade the reputation of the project due to a "knowingly buggy release" otherwise.

What do you think the appropriate fix should be?

I don't understand the intent of the operator and why the request amounts are re-purposed the way they are. Did you try to implement something like an eager-concatMap?

stealthcode · 2015-09-15T08:09:44Z

Sure we can roll back if that's what you think is best. I'll continue to hash out the changes in my branch and merge once you have had a chance to fully review the code.

Did you try to implement something like an eager-concatMap?

No. Although that was necessary for this feature this was not the only goal. As I explained in #3003, we need an api that gives users an easy way to create observables that respect back pressure semantics. The Observable.create is not sufficient for this however this is the method you would use if you had to produce an observable that provided no more that requested. The purpose of the AsyncOnSubscribe is so users can construct an observable from a non-eager source that can fulfill data requests for exactly up to the requested amount. The reason I used an eagerConcatMap is to handle the case where there are multiple concurrent requests made to the same producer. This would typically be seen when there is variable latencies associated with each batch, such as fulfilling data from an off-box service.

akarnokd · 2015-09-15T08:29:22Z

fulfill data requests for exactly up to the requested amount.

I almost never found a case where the request amount are hand-crafted. The only place this happens is when an operators such as observeOn() and merge() batch-out an arbitrary child request into standard element counts of 128 (by default).

It seems you want to turn a traditionally externally requested API (i.e., Observable getData(long maxAmount) into something that uses the producer as a channel to communicate in a different semantics with the source.

I think this is what full-duplex observables of the future may look like (some call them channels). This may be the aim for 3.x but is out of scope right now.

I'd take a step back and look at why you want to do such give-me-N type of use. Perhaps the right combination of defer, concat, subjects can get your desired behavior through a transformer instead.

stealthcode · 2015-09-15T18:01:40Z

We have that use case here at Netflix and this is a frequently requested feature. @benjchristensen and I believe that we need ways to create observables that respect back pressure (without the jagged edges of Observable.create) in 1.x because of the way that back pressure was introduced in RxJava. This was all documented (and we were really hoping to get your feedback) in issue #3003.

davidmoten · 2015-09-16T07:39:10Z

I went back and read #3003 and realized the similarities with #3020. I was interested in achieving the same aim except as an Operator/Transformer rather than an Observable creator. I don't think those use cases are particularly different as usually some simple stream using safe Observable creation methods can be gleaned from the creation use case and the complexity devolved to an Operator/Transformer.

In short I agree with @akarnokd that a combination of existing operators can provide this functionality. I used a combination of materialize,scan,flatMap,dematerialize to do it and I called the operation Transfomers.stateMachine(). I found I could map the code in an operator like StringObservable.split straight over to initialState,transition and completion lambdas and end up with an operator that respected backpressure with very little change to the original code (and without the potential blowout of using onBackpressureBuffer). The api and implementation of Transformers.stateMachine has been evolving slowly as my use cases arise and I expect to improve it further, in particular to push emissions downstream as soon as they are identified within the state machine probably with very localized use of onBackpressureBuffer.

Here is the backpressure respecting implementation of StringObservable.split and @abersnaze will probably recognize the similarity with the original version that did not support backpressure.

@stealthcode My apologies I haven't reviewed the PR in any detail but have been relying on the diligence of @akarnokd and waiting for what is distilled between you. I think the intent of the PR is terrific, that easier backpressure implementation techniques for custom observables and operators is really valuable. It's definitely something I come across frequently.

stealthcode · 2015-09-16T20:31:31Z

I'm looking for a replacement for Observable.create that gives users the ability to respond to requests and provide data precisely for what is requested. If you generate an excess then you have to buffer or drop data so that you don't overflow downstream buffers. This would be done by onBackpressureBuffer or onBackpressureDrop. With the AsyncOnSubscribe data is buffered only when there would be overlapping and out of ordered data events.

@davidmoten could you give me a code example of something that generates data for each request from an asynchronous source using composition of existing operators? At first glance the Transformers.stateMachine looks like the SyncOnSubscribe. It parses and emits data while respecting back pressure requests. One major goal here is to provide a means to create observables that will play nicely with draining and filling request buffers in observeOn and merge.

davidmoten · 2015-09-17T03:33:36Z

I've been reading #3003 and am starting to understand the async use case a bit more. My comments above were more relevant to SyncOnSubscribe, sorry about that.

Correct me if I'm wrong but the suggested technique is to on every request of n to generate an observable to fulfill those n items and concatenate the emissions of that observable with the observable generated from the previous request.

I'm wondering if we can explore the use case a bit further rather than discuss the implementation? Just to make sure we've covered alternatives. The use case that I've seen that speaks to me the strongest in this context is pagination.

Without anything new I would be looking at doing something like:

int pageSize=1000;
int limit = 10000;
Observable
    .range(0,limit)
    .concatMap(i -> restClient.getRange(i*1000, (i+1)*1000)
                                       .subscribeOn(scheduler));

The catch with that is that we want concatMap to maintain order (as opposed to flatMap) but concatMap doesn't subscribe to the second observable till the first observable is finished. For efficiency reasons we want more than onerestClient call happening simultaneously up to M of them. I'm starting to get this now (I hope).

To get a good non-blocking asynchronous solution the first thing I'd look for is an overload on Observable.concat() that gets passed a scheduler. This overload would subscribe to all of its children using the given scheduler and ensure correct ordering. I haven't yet thought of a way to do that with existing operators that is non-blocking but I'll keep pondering it.

In the meantime I wonder why this PR isn't simplified to contributing the Observable.concat() overload mentioned above.

davidmoten · 2015-09-17T03:44:12Z

ah, still getting up to speed. Noticed that @stealthcode and @akarnokd have already discussed the eager concatMap. I'd like to see that operator as part of the public API of this PR.

akarnokd · 2015-09-17T07:15:24Z

@davidmoten I was under the same impression as you that the requirement is most likely a form of pagination + eager subscription with maintaining order. What I understood is that whenever a consumer calls request(20), some generator has to return an observable that will generate 20 elements exactly. Once all 20 elements were received, another call with request(10) will generate 10 elements.

This behavior, contrary to the name of the proposed operator, works only in synchronous mode because the moment you apply observeOn, it will first request 128 and then 1-2-5 depending on the speed of elements. That equals to several observables with different number of elements misaligned with the original request amount. (I once proposed a request-rebatching operator in #3030 but it died due to lack of interest.)

I sense that this operator originates from a requirement were implementation details leaked into the requirement itself.

davidmoten · 2015-09-18T06:33:46Z

I'm also a bit uncomfortable about relying on sensible request sizes to be matched to observable creations. It does seem to open up a need for a request batching operator as mentioned by @akarnokd.

I'd prefer to see a PR with an eager concat operator (which is where the majority of the tricky work is seemingly) followed by a separate PR with the Observable creation utility that would use the eager concat (without relying on requests being matched to observable creations as presented by this PR currently).

stealthcode · 2015-09-18T20:29:00Z

What I understood is that whenever a consumer calls request(20), some generator has to return an observable that will generate 20 elements exactly. Once all 20 elements were received, another call with request(10) will generate 10 elements.

There is nothing preventing multiple requests from a subscriber while the first request is being processed. This could happen when each response is asynchronous or if observeOn or merge are changed to request more as the buffers are being drained.

I sense that this operator originates from a requirement were implementation details leaked into the requirement itself.

Can you elaborate on this? Firstly it's not an operator. Secondly, the requirement is that we give users a way to create observables that support back pressure even in cases where the data is being fulfilled asynchronously. I fail to see how the "implementation details leaked". The purpose of the AsyncOnSubscribe is to expose the requests in a simple interface so applications can respond with asynchronous abstractions and an observable makes sense as a consistent wrapper of asynchronous work.

@davidmoten Operators do not expose a simple to understand request-response protocol. The AsyncOnSubscribe allows a service to respond to a request with an asynchronous response. It'd be great to have the eager concat operator but that's not the goal of #3003 and I do not think that the two should be associated or necessarily dependent. For each request, respond asynchronously with data emissions fulfilling that request. This is a means to fill downstream buffers progressively without excess buffering, dropping, etc. If we expected users to eagerConcatMap their asynchronous responses together then we would have to expose a way to materialize the request(n) so that they could be subscribed to and mapped into a response. I do not think it's a good idea to expose/support that depth of back pressure in the public api for users to respond to with any means that they choose.

I need a way for application writers to construct ordered data in response to one or more request(n) in ways that does not violate back pressure and does not interleave. The pagination case is a great example highlighting the need for observables that can fulfill requests asynchronously in a simple manner (without the use of something like RxNetty which takes the observable to the selector).

Do you have objections to the goal or design?

akarnokd · 2015-09-19T13:01:49Z

See my proposed fixes in #3356.

davidmoten · 2015-09-20T01:14:43Z

It would be great to have the eager concat operator but that's not the goal of #3003 and I do not think that the two should be associated or necessarily dependent.

I was under the impression that the functionality of eager concat map would be the core of AsyncOnSubscribe and that once that was available constructing AsyncOnSubscribe would be quite simple. I am keen for the eager concat map functionality to be exposed from the get go from a reuse perspective. Certainly if we decide the request matching technique is a goer then it makes perfect sense to give users that instead of expecting them to use eager concat.

I'm still concerned about the practicality of using requests to create Observables.

Suppose I have an observable built using AsyncOnSubscribe and I'm hoping to request(1000) at a time from it for pagination purposes. Let's get a subscriber to control the requests:

source.subscribe(paginatingSubscriber);

Now as code evolves I would imagine wanting to insert an operator in this chain. @akarnokd has already mentioned the effect observeOn would have but I might try something even simpler, filter:

source
    .filter(condition)
    .subscribe(paginatingSubscriber);

Suppose the filter allows only half of the items through, then one of the effects of the filter operator is when an item is rejected by the filter a call of request(1) occurs. So this means one request from the paginatingSubscriber of 1000 has effected a single request of 1000 from the source followed by 500 requests of one and each request gets matched to say a rest api call. So the pagination batching intent has been subverted by the filter operator and most probably efficency goes out the window.

To handle this you might expect that an operator could be stuck just after source to do request batching. So requests might be buffered by this operator and so we don't stall the accumulated request count would be sent upstream every N milliseconds. So the operator would look a bit like this perhaps:

source
    .bufferRequests(1000, 100, TimeUnit.MILLISECONDS)
    .filter(condition)
    .subscriber(subscriber);

A design question then is should AsyncOnSubscribe incorporate the functionality of this made-up operator bufferRequests?

stealthcode · 2015-09-22T02:12:09Z

Suppose the filter allows only half of the items through, then one of the effects of the filter operator is when an item is rejected by the filter a call of request(1) occurs.

This simply sounds like an inefficient implementation of filter. I propose that filter be written to track its own rejection count and request up before the requested count == 0. There are several instances where the request patterns should be optimized. It would be great if we also reviewed the request patterns of other operators for more optimal solutions (merge and observeOn namely).

It's an interesting idea to try to code an AsyncOnSubscribe that buffers it's requests using a subject and operators internal to it's implementation. Here is a gist of what I'm thinking. This exercise might inform some modifications to the api to make that easier. For instance, an implementation that used a subject as the state and then onNexted the requested value then accumulated/throttled the requests.

should AsyncOnSubscribe incorporate the functionality of this made-up operator bufferRequests?

I'd rather see functionality like bufferRequests be capable of implementing as operators using the AsyncOnSubscribe inside of it's implementation instead of creating a special purpose operator on Observable. We shouldn't have to introduce special case handling of requests to operators that should simply play nicely together. I think that as an alternative to creating an operator that did this that we should just fix the operators that make requests inefficiently.

davidmoten · 2015-09-22T08:01:28Z

As an addendum to my last post, the time windowed bufferRequests is essentially blocking which is not really the RxJava way. Rather than waiting to receive a batch worth of requests we would request more than needed (up to the next integer multiple of batch size). In this case the operator/functionality would look like this:

source
    .bufferRequests(1000)
    .filter(condition)
    .subscriber(subscriber);

This simply sounds like an inefficient implementation of filter. I propose that filter be written to track its own rejection count and request up before the requested count == 0. There are several instances where the request patterns should be optimized. It would be great if we also reviewed the request patterns of other operators for more optimal solutions (merge and observeOn namely).

I'll be interested to see your suggestions on this one.

I'll read the rest of your answer later (time to go home!).

akarnokd added the Question label Sep 12, 2015

stealthcode mentioned this issue Sep 23, 2015

Fix to a bunch of bugs and issues with AsyncOnSubscribe #3356

Merged

akarnokd closed this as completed Nov 27, 2015

AsyncOnSubscribe strange behavior #3341

AsyncOnSubscribe strange behavior #3341

Comments

akarnokd commented Sep 12, 2015

stealthcode commented Sep 14, 2015

Uh oh!

stealthcode commented Sep 15, 2015

Uh oh!

akarnokd commented Sep 15, 2015

Uh oh!

stealthcode commented Sep 15, 2015

Uh oh!

akarnokd commented Sep 15, 2015

Uh oh!

stealthcode commented Sep 15, 2015

Uh oh!

davidmoten commented Sep 16, 2015

Uh oh!

stealthcode commented Sep 16, 2015

Uh oh!

davidmoten commented Sep 17, 2015

Uh oh!

davidmoten commented Sep 17, 2015

Uh oh!

akarnokd commented Sep 17, 2015

Uh oh!

davidmoten commented Sep 18, 2015

Uh oh!

stealthcode commented Sep 18, 2015

Uh oh!

akarnokd commented Sep 19, 2015

Uh oh!

davidmoten commented Sep 20, 2015

Uh oh!

stealthcode commented Sep 22, 2015

Uh oh!

davidmoten commented Sep 22, 2015

Uh oh!