You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
// uncomment the following if not using IDEA or having issues, for editing the template to see the includes
30
-
// note that with this line not commented out, the rendering of the root level asiidoc file will be incorrect (i.e.
31
-
// leave it commented out when committing work)
29
+
// uncomment the following if IDEA or having issues, for editing the template to see the includes
30
+
// note that with this line not commented out, the rendering of the root level asiidoc file will be incorrect (i.e. leave it commented out when committing work)
32
31
//:project_root: ../../
33
32
34
33
@@ -189,9 +188,12 @@ The end-to-end latency of the responses to these answers needs to be as low as t
189
188
* Kafka Streams app that had a slow stage
190
189
** We use Kafka Streams for our message processing, but one of it's steps have characteristics of the above and we need better performance.
191
190
We can break out as described below into the tool for processing that step, then return to the Kafka Streams context.
191
+
** Message load spikes causing heavy hot spots on join operations where a disproportionate quantity of messages land on one partition.
192
+
*** If keys cannot be adjusted or the topology modified to better distribute messages across partitions - consider performing a <<parallel-joins,parallel join operation>>.
192
193
* Provisioning extra machines (either virtual machines or real machines) to run multiple clients has a cost, using this library instead avoids the need for extra instances to be deployed in any respect.
193
194
194
195
196
+
195
197
== Feature List
196
198
* Have massively parallel consumption processing without running hundreds or thousands of
197
199
** Kafka consumer clients
@@ -488,6 +490,7 @@ In future versions, we plan to look at supporting other streaming systems like h
488
490
489
491
See the link:{project_root}/parallel-consumer-examples/parallel-consumer-example-vertx/src/main/java/io/confluent/parallelconsumer/examples/vertx/VertxApp.java[Vert.x example] project, and it's test.
490
492
493
+
491
494
[[streams-usage-code]]
492
495
=== Kafka Streams Concurrent Processing
493
496
@@ -496,22 +499,6 @@ Use your Streams app to process your data first, then send anything needed to be
PC can also be used to implement a parallel join against a data source, namely a Kafka Streams (KS) state store.
502
-
503
-
KS state stores can be read by external threads (external to the toplogy context), and so they can be reference in the processing function of PC.
504
-
If no item is present in the state store for the key, then it's a miss.
505
-
If there is, you can join the data any way you please, and potentially call an external system with the joined data, or output it back to a Kafka topic using the `#pollAndProdce` API if trying to avoid https://thorben-janssen.com/dual-writes/[dual write] scnearios.
WARNING::Although using a `GlobalKTable` is not strictly nesescary, the https://docs.confluent.io/platform/current/streams/developer-guide/interactive-queries.html[extra network hop] caused by data not being colocated when using a sharded `KTable` may negate the performance benefits in some scenarios.
510
-
For this reason, a `GlobalKTable` is recommended.
511
-
512
-
WARNING::Performing a join outsode of KS relinquishes ordering efforts KS applies to populating each side of the join.
513
-
Be careful using this technique if your operation is sensitive to the order in which data is populated in the state store vs arriving on the event stream.
514
-
515
502
.Preprocess in Kafka Streams, then process concurrently
516
503
[source,java,indent=0]
517
504
----
@@ -551,6 +538,51 @@ Be careful using this technique if your operation is sensitive to the order in w
551
538
552
539
See the link:{project_root}/parallel-consumer-examples/parallel-consumer-example-streams/src/main/java/io/confluent/parallelconsumer/examples/streams/StreamsApp.java[Kafka Streams example] project, and it's test.
553
540
541
+
542
+
[[parallel-joins]]
543
+
==== Parallel Joins
544
+
545
+
PC can also be used to implement a parallel join against a data source, namely a Kafka Streams (KS) state store. KS state stores can be read by external threads (external to the toplogy context), and so they can be reference in the processing function of PC.
CAUTION: Although using a `GlobalKTable` is not strictly necessary, the https://docs.confluent.io/platform/current/streams/developer-guide/interactive-queries.html[extra network hop] caused by data not being collocated when using a sharded `KTable` may negate the performance benefits in some scenarios.
// create payload with even details and call third party system, or produce a result message
567
+
userEvent.getEventPayload();
568
+
//....
569
+
} else { // <2>
570
+
// join miss
571
+
// drop - not registered devices for that user
572
+
}
573
+
});
574
+
}
575
+
----
576
+
<1> If no matching item is present in the state store for the key, then it's a miss.
577
+
<2> If lookup is not null, join the data any way you please, and potentially call an external system with the joined data, or output it back to a Kafka topic using the `#pollAndProdce` API if trying to avoid https://thorben-janssen.com/dual-writes/[dual write] scenarios.
578
+
579
+
See the link:{project_root}/parallel-consumer-examples/parallel-consumer-example-streams/src/main/java/io/confluent/parallelconsumer/examples/streams/ParallelJoin.java[Parallel join KS example] class.
580
+
581
+
WARNING: Performing a join outside of KS relinquishes ordering efforts KS applies to populating each side of the join - i.e. there is no effort to apply any ordering to the corresponding sides of this join.
582
+
When joining within KS, this is taken care of for you.
583
+
Be careful using this technique if your operation is sensitive to the order in which data is populated in the state store vs arriving from the event stream.
Copy file name to clipboardExpand all lines: parallel-consumer-examples/parallel-consumer-example-streams/src/main/java/io/confluent/parallelconsumer/examples/streams/ParallelJoin.java
// uncomment the following if not using IDEA or having issues, for editing the template to see the includes
30
-
// note that with this line not commented out, the rendering of the root level asiidoc file will be incorrect (i.e.
31
-
// leave it commented out when committing work)
29
+
// uncomment the following if IDEA or having issues, for editing the template to see the includes
30
+
// note that with this line not commented out, the rendering of the root level asiidoc file will be incorrect (i.e. leave it commented out when committing work)
32
31
//:project_root: ../../
33
32
34
33
@@ -187,9 +186,12 @@ The end-to-end latency of the responses to these answers needs to be as low as t
187
186
* Kafka Streams app that had a slow stage
188
187
** We use Kafka Streams for our message processing, but one of it's steps have characteristics of the above and we need better performance.
189
188
We can break out as described below into the tool for processing that step, then return to the Kafka Streams context.
189
+
** Message load spikes causing heavy hot spots on join operations where a disproportionate quantity of messages land on one partition.
190
+
*** If keys cannot be adjusted or the topology modified to better distribute messages across partitions - consider performing a <<parallel-joins,parallel join operation>>.
190
191
* Provisioning extra machines (either virtual machines or real machines) to run multiple clients has a cost, using this library instead avoids the need for extra instances to be deployed in any respect.
191
192
192
193
194
+
193
195
== Feature List
194
196
* Have massively parallel consumption processing without running hundreds or thousands of
See the link:{project_root}/parallel-consumer-examples/parallel-consumer-example-vertx/src/main/java/io/confluent/parallelconsumer/examples/vertx/VertxApp.java[Vert.x example] project, and it's test.
441
443
444
+
442
445
[[streams-usage-code]]
443
446
=== Kafka Streams Concurrent Processing
444
447
@@ -447,31 +450,41 @@ Use your Streams app to process your data first, then send anything needed to be
<1> Setup your Kafka Streams stage as per normal, performing any type of preprocessing in Kafka Streams
459
+
<2> For the slow consumer part of your Topology, drop down into the parallel consumer, and use massive concurrency
460
+
461
+
See the link:{project_root}/parallel-consumer-examples/parallel-consumer-example-streams/src/main/java/io/confluent/parallelconsumer/examples/streams/StreamsApp.java[Kafka Streams example] project, and it's test.
462
+
451
463
452
-
PC can also be used to implement a parallel join against a data source, namely a Kafka Streams (KS) state store.
464
+
[[parallel-joins]]
465
+
==== Parallel Joins
453
466
454
-
KS state stores can be read by external threads (external to the toplogy context), and so they can be reference in the processing function of PC.
455
-
If no item is present in the state store for the key, then it's a miss.
456
-
If there is, you can join the data any way you please, and potentially call an external system with the joined data, or output it back to a Kafka topic using the `#pollAndProdce` API if trying to avoid https://thorben-janssen.com/dual-writes/[dual write] scnearios.
467
+
PC can also be used to implement a parallel join against a data source, namely a Kafka Streams (KS) state store. KS state stores can be read by external threads (external to the toplogy context), and so they can be reference in the processing function of PC.
WARNING::Although using a `GlobalKTable` is not strictly nesescary, the https://docs.confluent.io/platform/current/streams/developer-guide/interactive-queries.html[extra network hop] caused by data not being colocated when using a sharded `KTable` may negate the performance benefits in some scenarios.
471
+
CAUTION: Although using a `GlobalKTable` is not strictly necessary, the https://docs.confluent.io/platform/current/streams/developer-guide/interactive-queries.html[extra network hop] caused by data not being collocated when using a sharded `KTable` may negate the performance benefits in some scenarios.
461
472
For this reason, a `GlobalKTable` is recommended.
462
473
463
-
WARNING::Performing a join outsode of KS relinquishes ordering efforts KS applies to populating each side of the join.
464
-
Be careful using this technique if your operation is sensitive to the order in which data is populated in the state store vs arriving on the event stream.
465
-
466
-
.Preprocess in Kafka Streams, then process concurrently
<1> Setup your Kafka Streams stage as per normal, performing any type of preprocessing in Kafka Streams
472
-
<2> For the slow consumer part of your Topology, drop down into the parallel consumer, and use massive concurrency
479
+
<1> If no matching item is present in the state store for the key, then it's a miss.
480
+
<2> If lookup is not null, join the data any way you please, and potentially call an external system with the joined data, or output it back to a Kafka topic using the `#pollAndProdce` API if trying to avoid https://thorben-janssen.com/dual-writes/[dual write] scenarios.
481
+
482
+
See the link:{project_root}/parallel-consumer-examples/parallel-consumer-example-streams/src/main/java/io/confluent/parallelconsumer/examples/streams/ParallelJoin.java[Parallel join KS example] class.
483
+
484
+
WARNING: Performing a join outside of KS relinquishes ordering efforts KS applies to populating each side of the join - i.e. there is no effort to apply any ordering to the corresponding sides of this join.
485
+
When joining within KS, this is taken care of for you.
486
+
Be careful using this technique if your operation is sensitive to the order in which data is populated in the state store vs arriving from the event stream.
473
487
474
-
See the link:{project_root}/parallel-consumer-examples/parallel-consumer-example-streams/src/main/java/io/confluent/parallelconsumer/examples/streams/StreamsApp.java[Kafka Streams example] project, and it's test.
0 commit comments