Skip to content

AsyncOnSubscribe strange behavior #3341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
akarnokd opened this issue Sep 12, 2015 · 17 comments
Closed

AsyncOnSubscribe strange behavior #3341

akarnokd opened this issue Sep 12, 2015 · 17 comments
Labels

Comments

@akarnokd
Copy link
Member

I've been asked to show test cases that prove the existence of bugs I mentioned in #3203.

First, I tried to show the race between inner termination and outer unsubscribtion, but the following simple code returns an error instead of any value:

@Test
public void bugRequestAlwaysMax() {
    Action2<Long, Observer<Observable<? extends Integer>>> action2 = new Action2<Long, Observer<Observable<? extends Integer>>>() {
        @Override
        public void call(Long t1, Observer<Observable<? extends Integer>> t2) {
            System.out.println(t1);
            int intValue = t1.intValue();
            t2.onNext(
                    Observable.range(1, intValue)
            );
        }
    };

    final TestSubscriber<Integer> ts = TestSubscriber.create(0);

    AsyncOnSubscribe.createStateless(action2).call(ts);

    ts.requestMore(1);

    ts.assertNoErrors();
    ts.assertValue(1);
}
java.lang.AssertionError: Unexpected onError events: 1
    at rx.observers.TestSubscriber.assertNoErrors(TestSubscriber.java:263)
    at rx.observables.AsyncOnSubscribeTest.bugUnsubscribeVsInnerComplete(AsyncOnSubscribeTest.java:443)
Caused by: java.lang.IllegalArgumentException: Count can not be negative
    at rx.Observable.range(Observable.java:2514)
    at rx.observables.AsyncOnSubscribeTest$27.call(AsyncOnSubscribeTest.java:425)
    at rx.observables.AsyncOnSubscribeTest$27.call(AsyncOnSubscribeTest.java:1)
    at rx.observables.AsyncOnSubscribe$3.call(AsyncOnSubscribe.java:223)
    at rx.observables.AsyncOnSubscribe$3.call(AsyncOnSubscribe.java:1)
    at rx.observables.AsyncOnSubscribe$AsyncOnSubscribeImpl.next(AsyncOnSubscribe.java:303)
    at rx.observables.AsyncOnSubscribe$AsyncOuterSubscriber.nextIteration(AsyncOnSubscribe.java:370)
    at rx.observables.AsyncOnSubscribe$AsyncOuterSubscriber.request(AsyncOnSubscribe.java:392)
    at rx.Subscriber.setProducer(Subscriber.java:209)
    at rx.observables.AsyncOnSubscribe.call(AsyncOnSubscribe.java:320)
    at rx.observables.AsyncOnSubscribe.call(AsyncOnSubscribe.java:1)
    at rx.observables.AsyncOnSubscribeTest.bugUnsubscribeVsInnerComplete(AsyncOnSubscribeTest.java:439)
    ... 26 more

It seems that AsyncOuterSubscriber always requests Long.MAX_VALUE even if the child requests nothing. This value ends up as -1 and hence the error.

I've debugged testOnUnsubscribeHasCorrectState and it seems there is an initial request of MAX_VALUE which triggers state 1. the value is also put into the queue which is polled on the next iteration yielding another MAX_VALUE and state 2. At this point, the main reaches requestMore(2) which triggers state 3 and requestMore(3) does nothing since the inner has called onComplete().

The following code hangs after a few dozen iterations. It tries to overlap the termination of the inner range with a concurrent cancellation.

ExecutorService exec = Executors.newSingleThreadExecutor();
try {
    for (int i = 0; i < 1000; i++) {
        if (i % 50 == 0) {
            System.out.println("bugUnsubscribeVsInnerComplete >> " + i);
        }
        final CyclicBarrier cb = new CyclicBarrier(2);
        Action2<Long, Observer<Observable<? extends Integer>>> action2 = new Action2<Long, Observer<Observable<? extends Integer>>>() {
            @Override
            public void call(Long t1, Observer<Observable<? extends Integer>> t2) {
                System.out.println(t1);
                t2.onNext(
                        Observable.range(1, 1)
                        .doOnSubscribe(new Action0() {
                            @Override
                            public void call() {
                                System.out.println("Subscribed to inner");
                            }
                        })
                        .doOnCompleted(new Action0() {
                            @Override
                            public void call() {
                                System.out.println("Entering completion handler");
                                await(cb);
                            }
                        })
                );
            }
        };

        final TestSubscriber<Integer> ts = TestSubscriber.create(0);

        Future<?> f = exec.submit(new Runnable() {
            @Override
            public void run() {
                System.out.println("Entering unsubscribe");
                await(cb);
                ts.unsubscribe();
            }
        });

        AsyncOnSubscribe.createStateless(action2).call(ts);


        f.get();

        ts.assertNoErrors();
    }
} finally {
    exec.shutdownNow();
}

I couldn't get further with the other bugs because the behavior of this two tests prevent any other progress.

Since I don't understand what the original intent was nor how the request-juggling supposed to work, I can't offer any suggestion on what to and how to fix the operator. Therefore, I suggest rolling back the merge and extending the unit test with more checks, including real concurrent ones (not just TestScheduler-based).

@stealthcode
Copy link

@akarnokd Thanks for writing this up. I'll try to pick through it for problems. Regarding the first item (bugRequestAlwaysMax) the TestSubscriber's is actually calling request with Long.MAX_VALUE. The TestSubscriber.create(0) doesn't seem to be deferring the request until after subscription. This could be because the onStart() is not called. What do you think the appropriate fix should be?

@stealthcode
Copy link

Why should be rollback? The AsyncOnSubscribe is marked as @Experimental and we have not made a release yet. This functionality is independent and clearly should not be in use until these issues can be addressed.

@akarnokd
Copy link
Member Author

Why should be rollback?

I'm afraid the problem with this component will either delay the release of 1.0.15 again and may degrade the reputation of the project due to a "knowingly buggy release" otherwise.

What do you think the appropriate fix should be?

I don't understand the intent of the operator and why the request amounts are re-purposed the way they are. Did you try to implement something like an eager-concatMap?

@stealthcode
Copy link

Sure we can roll back if that's what you think is best. I'll continue to hash out the changes in my branch and merge once you have had a chance to fully review the code.

Did you try to implement something like an eager-concatMap?

No. Although that was necessary for this feature this was not the only goal. As I explained in #3003, we need an api that gives users an easy way to create observables that respect back pressure semantics. The Observable.create is not sufficient for this however this is the method you would use if you had to produce an observable that provided no more that requested. The purpose of the AsyncOnSubscribe is so users can construct an observable from a non-eager source that can fulfill data requests for exactly up to the requested amount. The reason I used an eagerConcatMap is to handle the case where there are multiple concurrent requests made to the same producer. This would typically be seen when there is variable latencies associated with each batch, such as fulfilling data from an off-box service.

@akarnokd
Copy link
Member Author

fulfill data requests for exactly up to the requested amount.

I almost never found a case where the request amount are hand-crafted. The only place this happens is when an operators such as observeOn() and merge() batch-out an arbitrary child request into standard element counts of 128 (by default).

It seems you want to turn a traditionally externally requested API (i.e., Observable getData(long maxAmount) into something that uses the producer as a channel to communicate in a different semantics with the source.

I think this is what full-duplex observables of the future may look like (some call them channels). This may be the aim for 3.x but is out of scope right now.

I'd take a step back and look at why you want to do such give-me-N type of use. Perhaps the right combination of defer, concat, subjects can get your desired behavior through a transformer instead.

@stealthcode
Copy link

We have that use case here at Netflix and this is a frequently requested feature. @benjchristensen and I believe that we need ways to create observables that respect back pressure (without the jagged edges of Observable.create) in 1.x because of the way that back pressure was introduced in RxJava. This was all documented (and we were really hoping to get your feedback) in issue #3003.

@davidmoten
Copy link
Collaborator

I went back and read #3003 and realized the similarities with #3020. I was interested in achieving the same aim except as an Operator/Transformer rather than an Observable creator. I don't think those use cases are particularly different as usually some simple stream using safe Observable creation methods can be gleaned from the creation use case and the complexity devolved to an Operator/Transformer.

In short I agree with @akarnokd that a combination of existing operators can provide this functionality. I used a combination of materialize,scan,flatMap,dematerialize to do it and I called the operation Transfomers.stateMachine(). I found I could map the code in an operator like StringObservable.split straight over to initialState,transition and completion lambdas and end up with an operator that respected backpressure with very little change to the original code (and without the potential blowout of using onBackpressureBuffer). The api and implementation of Transformers.stateMachine has been evolving slowly as my use cases arise and I expect to improve it further, in particular to push emissions downstream as soon as they are identified within the state machine probably with very localized use of onBackpressureBuffer.

Here is the backpressure respecting implementation of StringObservable.split and @abersnaze will probably recognize the similarity with the original version that did not support backpressure.

@stealthcode My apologies I haven't reviewed the PR in any detail but have been relying on the diligence of @akarnokd and waiting for what is distilled between you. I think the intent of the PR is terrific, that easier backpressure implementation techniques for custom observables and operators is really valuable. It's definitely something I come across frequently.

@stealthcode
Copy link

I'm looking for a replacement for Observable.create that gives users the ability to respond to requests and provide data precisely for what is requested. If you generate an excess then you have to buffer or drop data so that you don't overflow downstream buffers. This would be done by onBackpressureBuffer or onBackpressureDrop. With the AsyncOnSubscribe data is buffered only when there would be overlapping and out of ordered data events.

@davidmoten could you give me a code example of something that generates data for each request from an asynchronous source using composition of existing operators? At first glance the Transformers.stateMachine looks like the SyncOnSubscribe. It parses and emits data while respecting back pressure requests. One major goal here is to provide a means to create observables that will play nicely with draining and filling request buffers in observeOn and merge.

@davidmoten
Copy link
Collaborator

I've been reading #3003 and am starting to understand the async use case a bit more. My comments above were more relevant to SyncOnSubscribe, sorry about that.

Correct me if I'm wrong but the suggested technique is to on every request of n to generate an observable to fulfill those n items and concatenate the emissions of that observable with the observable generated from the previous request.

I'm wondering if we can explore the use case a bit further rather than discuss the implementation? Just to make sure we've covered alternatives. The use case that I've seen that speaks to me the strongest in this context is pagination.

Without anything new I would be looking at doing something like:

int pageSize=1000;
int limit = 10000;
Observable
    .range(0,limit)
    .concatMap(i -> restClient.getRange(i*1000, (i+1)*1000)
                                       .subscribeOn(scheduler));

The catch with that is that we want concatMap to maintain order (as opposed to flatMap) but concatMap doesn't subscribe to the second observable till the first observable is finished. For efficiency reasons we want more than onerestClient call happening simultaneously up to M of them. I'm starting to get this now (I hope).

To get a good non-blocking asynchronous solution the first thing I'd look for is an overload on Observable.concat() that gets passed a scheduler. This overload would subscribe to all of its children using the given scheduler and ensure correct ordering. I haven't yet thought of a way to do that with existing operators that is non-blocking but I'll keep pondering it.

In the meantime I wonder why this PR isn't simplified to contributing the Observable.concat() overload mentioned above.

@davidmoten
Copy link
Collaborator

ah, still getting up to speed. Noticed that @stealthcode and @akarnokd have already discussed the eager concatMap. I'd like to see that operator as part of the public API of this PR.

@akarnokd
Copy link
Member Author

@davidmoten I was under the same impression as you that the requirement is most likely a form of pagination + eager subscription with maintaining order. What I understood is that whenever a consumer calls request(20), some generator has to return an observable that will generate 20 elements exactly. Once all 20 elements were received, another call with request(10) will generate 10 elements.

This behavior, contrary to the name of the proposed operator, works only in synchronous mode because the moment you apply observeOn, it will first request 128 and then 1-2-5 depending on the speed of elements. That equals to several observables with different number of elements misaligned with the original request amount. (I once proposed a request-rebatching operator in #3030 but it died due to lack of interest.)

I sense that this operator originates from a requirement were implementation details leaked into the requirement itself.

@davidmoten
Copy link
Collaborator

I'm also a bit uncomfortable about relying on sensible request sizes to be matched to observable creations. It does seem to open up a need for a request batching operator as mentioned by @akarnokd.

I'd prefer to see a PR with an eager concat operator (which is where the majority of the tricky work is seemingly) followed by a separate PR with the Observable creation utility that would use the eager concat (without relying on requests being matched to observable creations as presented by this PR currently).

@stealthcode
Copy link

What I understood is that whenever a consumer calls request(20), some generator has to return an observable that will generate 20 elements exactly. Once all 20 elements were received, another call with request(10) will generate 10 elements.

There is nothing preventing multiple requests from a subscriber while the first request is being processed. This could happen when each response is asynchronous or if observeOn or merge are changed to request more as the buffers are being drained.

I sense that this operator originates from a requirement were implementation details leaked into the requirement itself.

Can you elaborate on this? Firstly it's not an operator. Secondly, the requirement is that we give users a way to create observables that support back pressure even in cases where the data is being fulfilled asynchronously. I fail to see how the "implementation details leaked". The purpose of the AsyncOnSubscribe is to expose the requests in a simple interface so applications can respond with asynchronous abstractions and an observable makes sense as a consistent wrapper of asynchronous work.

@davidmoten Operators do not expose a simple to understand request-response protocol. The AsyncOnSubscribe allows a service to respond to a request with an asynchronous response. It'd be great to have the eager concat operator but that's not the goal of #3003 and I do not think that the two should be associated or necessarily dependent. For each request, respond asynchronously with data emissions fulfilling that request. This is a means to fill downstream buffers progressively without excess buffering, dropping, etc. If we expected users to eagerConcatMap their asynchronous responses together then we would have to expose a way to materialize the request(n) so that they could be subscribed to and mapped into a response. I do not think it's a good idea to expose/support that depth of back pressure in the public api for users to respond to with any means that they choose.

I need a way for application writers to construct ordered data in response to one or more request(n) in ways that does not violate back pressure and does not interleave. The pagination case is a great example highlighting the need for observables that can fulfill requests asynchronously in a simple manner (without the use of something like RxNetty which takes the observable to the selector).

Do you have objections to the goal or design?

@akarnokd
Copy link
Member Author

See my proposed fixes in #3356.

@davidmoten
Copy link
Collaborator

It would be great to have the eager concat operator but that's not the goal of #3003 and I do not think that the two should be associated or necessarily dependent.

I was under the impression that the functionality of eager concat map would be the core of AsyncOnSubscribe and that once that was available constructing AsyncOnSubscribe would be quite simple. I am keen for the eager concat map functionality to be exposed from the get go from a reuse perspective. Certainly if we decide the request matching technique is a goer then it makes perfect sense to give users that instead of expecting them to use eager concat.

I'm still concerned about the practicality of using requests to create Observables.

Suppose I have an observable built using AsyncOnSubscribe and I'm hoping to request(1000) at a time from it for pagination purposes. Let's get a subscriber to control the requests:

source.subscribe(paginatingSubscriber);

Now as code evolves I would imagine wanting to insert an operator in this chain. @akarnokd has already mentioned the effect observeOn would have but I might try something even simpler, filter:

source
    .filter(condition)
    .subscribe(paginatingSubscriber);

Suppose the filter allows only half of the items through, then one of the effects of the filter operator is when an item is rejected by the filter a call of request(1) occurs. So this means one request from the paginatingSubscriber of 1000 has effected a single request of 1000 from the source followed by 500 requests of one and each request gets matched to say a rest api call. So the pagination batching intent has been subverted by the filter operator and most probably efficency goes out the window.

To handle this you might expect that an operator could be stuck just after source to do request batching. So requests might be buffered by this operator and so we don't stall the accumulated request count would be sent upstream every N milliseconds. So the operator would look a bit like this perhaps:

source
    .bufferRequests(1000, 100, TimeUnit.MILLISECONDS)
    .filter(condition)
    .subscriber(subscriber);

A design question then is should AsyncOnSubscribe incorporate the functionality of this made-up operator bufferRequests?

@stealthcode
Copy link

Suppose the filter allows only half of the items through, then one of the effects of the filter operator is when an item is rejected by the filter a call of request(1) occurs.

This simply sounds like an inefficient implementation of filter. I propose that filter be written to track its own rejection count and request up before the requested count == 0. There are several instances where the request patterns should be optimized. It would be great if we also reviewed the request patterns of other operators for more optimal solutions (merge and observeOn namely).

It's an interesting idea to try to code an AsyncOnSubscribe that buffers it's requests using a subject and operators internal to it's implementation. Here is a gist of what I'm thinking. This exercise might inform some modifications to the api to make that easier. For instance, an implementation that used a subject as the state and then onNexted the requested value then accumulated/throttled the requests.

should AsyncOnSubscribe incorporate the functionality of this made-up operator bufferRequests?

I'd rather see functionality like bufferRequests be capable of implementing as operators using the AsyncOnSubscribe inside of it's implementation instead of creating a special purpose operator on Observable. We shouldn't have to introduce special case handling of requests to operators that should simply play nicely together. I think that as an alternative to creating an operator that did this that we should just fix the operators that make requests inefficiently.

@davidmoten
Copy link
Collaborator

As an addendum to my last post, the time windowed bufferRequests is essentially blocking which is not really the RxJava way. Rather than waiting to receive a batch worth of requests we would request more than needed (up to the next integer multiple of batch size). In this case the operator/functionality would look like this:

source
    .bufferRequests(1000)
    .filter(condition)
    .subscriber(subscriber);

This simply sounds like an inefficient implementation of filter. I propose that filter be written to track its own rejection count and request up before the requested count == 0. There are several instances where the request patterns should be optimized. It would be great if we also reviewed the request patterns of other operators for more optimal solutions (merge and observeOn namely).

I'll be interested to see your suggestions on this one.

I'll read the rest of your answer later (time to go home!).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants