edit

astubbs · astubbs · commit 1317fe7b64a3 · 2021-03-02T14:47:44.000+13:00
add scenario

edit
diff --git a/README.adoc b/README.adoc
@@ -23,12 +23,11 @@
 :base_url: https://github.com/confluentinc/{github_name}
 :issues_link: {base_url}/issues
 
-// dynamic include base for editing in IDEA
+// dynamic include base for rendering
 :project_root: ./
 
-// uncomment the following if not using IDEA or having issues, for editing the template to see the includes
-// note that with this line not commented out, the rendering of the root level asiidoc file will be incorrect (i.e.
-// leave it commented out when committing work)
+// uncomment the following if IDEA or having issues, for editing the template to see the includes
+// note that with this line not commented out, the rendering of the root level asiidoc file will be incorrect (i.e. leave it commented out when committing work)
 //:project_root: ../../
 
 
@@ -189,9 +188,12 @@ The end-to-end latency of the responses to these answers needs to be as low as t
 * Kafka Streams app that had a slow stage
 ** We use Kafka Streams for our message processing, but one of it's steps have characteristics of the above and we need better performance.
 We can break out as described below into the tool for processing that step, then return to the Kafka Streams context.
+** Message load spikes causing heavy hot spots on join operations where a disproportionate quantity of messages land on one partition.
+*** If keys cannot be adjusted or the topology modified to better distribute messages across partitions - consider performing a <<parallel-joins,parallel join operation>>.
 * Provisioning extra machines (either virtual machines or real machines) to run multiple clients has a cost, using this library instead avoids the need for extra instances to be deployed in any respect.
 
 
+
 == Feature List
 * Have massively parallel consumption processing without running hundreds or thousands of
 ** Kafka consumer clients
@@ -488,6 +490,7 @@ In future versions, we plan to look at supporting other streaming systems like h
 
 See the link:{project_root}/parallel-consumer-examples/parallel-consumer-example-vertx/src/main/java/io/confluent/parallelconsumer/examples/vertx/VertxApp.java[Vert.x example] project, and it's test.
 
+
 [[streams-usage-code]]
 === Kafka Streams Concurrent Processing
 
@@ -496,22 +499,6 @@ Use your Streams app to process your data first, then send anything needed to be
 .Example usage with Kafka Streams
 image::https://lucid.app/publicSegments/view/43f2740c-2a7f-4b7f-909e-434a5bbe3fbf/image.png[Kafka Streams Usage, align="center"]
 
-==== Parallel Joins
-
-PC can also be used to implement a parallel join against a data source, namely a Kafka Streams (KS) state store.
-
-KS state stores can be read by external threads (external to the toplogy context), and so they can be reference in the processing function of PC.
-If no item is present in the state store for the key, then it's a miss.
-If there is, you can join the data any way you please, and potentially call an external system with the joined data, or output it back to a Kafka topic using the `#pollAndProdce` API if trying to avoid https://thorben-janssen.com/dual-writes/[dual write] scnearios.
-
-image::https://lucid.app/publicSegments/view/d144c027-653c-4e77-bfa4-8ecaadba1385/image.png[]
-
-WARNING::Although using a `GlobalKTable` is not strictly nesescary, the https://docs.confluent.io/platform/current/streams/developer-guide/interactive-queries.html[extra network hop] caused by data not being colocated when using a sharded `KTable` may negate the performance benefits in some scenarios.
-For this reason, a `GlobalKTable` is recommended.
-
-WARNING::Performing a join outsode of KS relinquishes ordering efforts KS applies to populating each side of the join.
-Be careful using this technique if your operation is sensitive to the order in which data is populated in the state store vs arriving on the event stream.
-
 .Preprocess in Kafka Streams, then process concurrently
 [source,java,indent=0]
 ----
@@ -551,6 +538,51 @@ Be careful using this technique if your operation is sensitive to the order in w
 
 See the link:{project_root}/parallel-consumer-examples/parallel-consumer-example-streams/src/main/java/io/confluent/parallelconsumer/examples/streams/StreamsApp.java[Kafka Streams example] project, and it's test.
 
+
+[[parallel-joins]]
+==== Parallel Joins
+
+PC can also be used to implement a parallel join against a data source, namely a Kafka Streams (KS) state store. KS state stores can be read by external threads (external to the toplogy context), and so they can be reference in the processing function of PC.
+
+image::https://lucid.app/publicSegments/view/d144c027-653c-4e77-bfa4-8ecaadba1385/image.png[]
+
+CAUTION: Although using a `GlobalKTable` is not strictly necessary, the https://docs.confluent.io/platform/current/streams/developer-guide/interactive-queries.html[extra network hop] caused by data not being collocated when using a sharded `KTable` may negate the performance benefits in some scenarios.
+For this reason, a `GlobalKTable` is recommended.
+
+.Parallel joins with Kafka Streams
+[source,java,indent=0]
+----
+    /**
+     * Needs KeyValueStore injected.
+     */
+    ParallelJoin(KeyValueStore<UserId, UserProfile> store, ParallelConsumer<UserId, UserEvent> pc) {
+        pc.poll(record -> {
+            UserId userId = record.key();
+            UserEvent userEvent = record.value();
+
+            UserProfile userProfile = store.get(userId);
+            if (userProfile != null) { // <1>
+                // join hit
+                // create payload with even details and call third party system, or produce a result message
+                userEvent.getEventPayload();
+                //....
+            } else { // <2>
+                // join miss
+                // drop - not registered devices for that user
+            }
+        });
+    }
+----
+<1> If no matching item is present in the state store for the key, then it's a miss.
+<2> If lookup is not null, join the data any way you please, and potentially call an external system with the joined data, or output it back to a Kafka topic using the `#pollAndProdce` API if trying to avoid https://thorben-janssen.com/dual-writes/[dual write] scenarios.
+
+See the link:{project_root}/parallel-consumer-examples/parallel-consumer-example-streams/src/main/java/io/confluent/parallelconsumer/examples/streams/ParallelJoin.java[Parallel join KS example] class.
+
+WARNING: Performing a join outside of KS relinquishes ordering efforts KS applies to populating each side of the join - i.e. there is no effort to apply any ordering to the corresponding sides of this join.
+When joining within KS, this is taken care of for you.
+Be careful using this technique if your operation is sensitive to the order in which data is populated in the state store vs arriving from the event stream.
+
+
 [[ordering-guarantees]]
 == Ordering Guarantees
 
@@ -672,7 +704,7 @@ https://cwiki.apache.org/confluence/display/KAFKA/KIP-408%3A+Add+Asynchronous+Pr
 However, any given preprocessing can be done in KS, preparing the messages.
 One can then use this library to consume from an input topic, produced by KS to process the messages in parallel.
 
-For a code example, see the <<streams-usage-code>> section.
+For a code example, see the <<streams-usage-code>> and <<parallel-joins>> section.
 
 .Example usage with Kafka Streams
 image::https://lucid.app/publicSegments/view/43f2740c-2a7f-4b7f-909e-434a5bbe3fbf/image.png[Kafka Streams Usage, align="center"]
diff --git a/parallel-consumer-examples/parallel-consumer-example-streams/src/main/java/io/confluent/parallelconsumer/examples/streams/ParallelJoin.java b/parallel-consumer-examples/parallel-consumer-example-streams/src/main/java/io/confluent/parallelconsumer/examples/streams/ParallelJoin.java
@@ -10,6 +10,7 @@
 
 public class ParallelJoin {
 
+    // tag::example[]
     /**
      * Needs KeyValueStore injected.
      */
@@ -18,17 +19,18 @@ public class ParallelJoin {
             UserId userId = record.key();
             UserEvent userEvent = record.value();
 
-            UserProfile userDeviceTokenRegistry = store.get(userId);
-            if (userDeviceTokenRegistry != null) {
+            UserProfile userProfile = store.get(userId);
+            if (userProfile != null) { // <1>
                 // join hit
                 // create payload with even details and call third party system, or produce a result message
                 userEvent.getEventPayload();
                 //....
-            } else {
+            } else { // <2>
                 // join miss
                 // drop - not registered devices for that user
             }
         });
     }
+    // end::example[]
 
 }
diff --git a/src/docs/README.adoc b/src/docs/README.adoc
@@ -23,12 +23,11 @@
 :base_url: https://github.com/confluentinc/{github_name}
 :issues_link: {base_url}/issues
 
-// dynamic include base for editing in IDEA
+// dynamic include base for rendering
 :project_root: ./
 
-// uncomment the following if not using IDEA or having issues, for editing the template to see the includes
-// note that with this line not commented out, the rendering of the root level asiidoc file will be incorrect (i.e.
-// leave it commented out when committing work)
+// uncomment the following if IDEA or having issues, for editing the template to see the includes
+// note that with this line not commented out, the rendering of the root level asiidoc file will be incorrect (i.e. leave it commented out when committing work)
 //:project_root: ../../
 
 
@@ -187,9 +186,12 @@ The end-to-end latency of the responses to these answers needs to be as low as t
 * Kafka Streams app that had a slow stage
 ** We use Kafka Streams for our message processing, but one of it's steps have characteristics of the above and we need better performance.
 We can break out as described below into the tool for processing that step, then return to the Kafka Streams context.
+** Message load spikes causing heavy hot spots on join operations where a disproportionate quantity of messages land on one partition.
+*** If keys cannot be adjusted or the topology modified to better distribute messages across partitions - consider performing a <<parallel-joins,parallel join operation>>.
 * Provisioning extra machines (either virtual machines or real machines) to run multiple clients has a cost, using this library instead avoids the need for extra instances to be deployed in any respect.
 
 
+
 == Feature List
 * Have massively parallel consumption processing without running hundreds or thousands of
 ** Kafka consumer clients
@@ -439,6 +441,7 @@ include::{project_root}/parallel-consumer-examples/parallel-consumer-example-ver
 
 See the link:{project_root}/parallel-consumer-examples/parallel-consumer-example-vertx/src/main/java/io/confluent/parallelconsumer/examples/vertx/VertxApp.java[Vert.x example] project, and it's test.
 
+
 [[streams-usage-code]]
 === Kafka Streams Concurrent Processing
 
@@ -447,31 +450,41 @@ Use your Streams app to process your data first, then send anything needed to be
 .Example usage with Kafka Streams
 image::https://lucid.app/publicSegments/view/43f2740c-2a7f-4b7f-909e-434a5bbe3fbf/image.png[Kafka Streams Usage, align="center"]
 
-==== Parallel Joins
+.Preprocess in Kafka Streams, then process concurrently
+[source,java,indent=0]
+----
+include::{project_root}/parallel-consumer-examples/parallel-consumer-example-streams/src/main/java/io/confluent/parallelconsumer/examples/streams/StreamsApp.java[tag=example]
+----
+<1> Setup your Kafka Streams stage as per normal, performing any type of preprocessing in Kafka Streams
+<2> For the slow consumer part of your Topology, drop down into the parallel consumer, and use massive concurrency
+
+See the link:{project_root}/parallel-consumer-examples/parallel-consumer-example-streams/src/main/java/io/confluent/parallelconsumer/examples/streams/StreamsApp.java[Kafka Streams example] project, and it's test.
+
 
-PC can also be used to implement a parallel join against a data source, namely a Kafka Streams (KS) state store.
+[[parallel-joins]]
+==== Parallel Joins
 
-KS state stores can be read by external threads (external to the toplogy context), and so they can be reference in the processing function of PC.
-If no item is present in the state store for the key, then it's a miss.
-If there is, you can join the data any way you please, and potentially call an external system with the joined data, or output it back to a Kafka topic using the `#pollAndProdce` API if trying to avoid https://thorben-janssen.com/dual-writes/[dual write] scnearios.
+PC can also be used to implement a parallel join against a data source, namely a Kafka Streams (KS) state store. KS state stores can be read by external threads (external to the toplogy context), and so they can be reference in the processing function of PC.
 
 image::https://lucid.app/publicSegments/view/d144c027-653c-4e77-bfa4-8ecaadba1385/image.png[]
 
-WARNING::Although using a `GlobalKTable` is not strictly nesescary, the https://docs.confluent.io/platform/current/streams/developer-guide/interactive-queries.html[extra network hop] caused by data not being colocated when using a sharded `KTable` may negate the performance benefits in some scenarios.
+CAUTION: Although using a `GlobalKTable` is not strictly necessary, the https://docs.confluent.io/platform/current/streams/developer-guide/interactive-queries.html[extra network hop] caused by data not being collocated when using a sharded `KTable` may negate the performance benefits in some scenarios.
 For this reason, a `GlobalKTable` is recommended.
 
-WARNING::Performing a join outsode of KS relinquishes ordering efforts KS applies to populating each side of the join.
-Be careful using this technique if your operation is sensitive to the order in which data is populated in the state store vs arriving on the event stream.
-
-.Preprocess in Kafka Streams, then process concurrently
+.Parallel joins with Kafka Streams
 [source,java,indent=0]
 ----
-include::{project_root}/parallel-consumer-examples/parallel-consumer-example-streams/src/main/java/io/confluent/parallelconsumer/examples/streams/StreamsApp.java[tag=example]
+include::{project_root}/parallel-consumer-examples/parallel-consumer-example-streams/src/main/java/io/confluent/parallelconsumer/examples/streams/ParallelJoin.java[tag=example]
 ----
-<1> Setup your Kafka Streams stage as per normal, performing any type of preprocessing in Kafka Streams
-<2> For the slow consumer part of your Topology, drop down into the parallel consumer, and use massive concurrency
+<1> If no matching item is present in the state store for the key, then it's a miss.
+<2> If lookup is not null, join the data any way you please, and potentially call an external system with the joined data, or output it back to a Kafka topic using the `#pollAndProdce` API if trying to avoid https://thorben-janssen.com/dual-writes/[dual write] scenarios.
+
+See the link:{project_root}/parallel-consumer-examples/parallel-consumer-example-streams/src/main/java/io/confluent/parallelconsumer/examples/streams/ParallelJoin.java[Parallel join KS example] class.
+
+WARNING: Performing a join outside of KS relinquishes ordering efforts KS applies to populating each side of the join - i.e. there is no effort to apply any ordering to the corresponding sides of this join.
+When joining within KS, this is taken care of for you.
+Be careful using this technique if your operation is sensitive to the order in which data is populated in the state store vs arriving from the event stream.
 
-See the link:{project_root}/parallel-consumer-examples/parallel-consumer-example-streams/src/main/java/io/confluent/parallelconsumer/examples/streams/StreamsApp.java[Kafka Streams example] project, and it's test.
 
 [[ordering-guarantees]]
 == Ordering Guarantees
@@ -594,7 +607,7 @@ https://cwiki.apache.org/confluence/display/KAFKA/KIP-408%3A+Add+Asynchronous+Pr
 However, any given preprocessing can be done in KS, preparing the messages.
 One can then use this library to consume from an input topic, produced by KS to process the messages in parallel.
 
-For a code example, see the <<streams-usage-code>> section.
+For a code example, see the <<streams-usage-code>> and <<parallel-joins>> section.
 
 .Example usage with Kafka Streams
 image::https://lucid.app/publicSegments/view/43f2740c-2a7f-4b7f-909e-434a5bbe3fbf/image.png[Kafka Streams Usage, align="center"]