Skip to content

Conversation

astubbs
Copy link
Contributor

@astubbs astubbs commented Mar 1, 2021

@astubbs astubbs force-pushed the parallel-join-technique branch from 04952cc to 1317fe7 Compare March 2, 2021 01:47
WARNING: Performing a join outside of KS relinquishes ordering efforts KS applies to populating each side of the join - i.e. there is no effort to apply any ordering to the corresponding sides of this join.
When joining within KS, this is taken care of for you.
Be careful using this technique if your operation is sensitive to the order in which data is populated in the state store vs arriving from the event stream.

Copy link
Contributor

@JorgenRingen JorgenRingen Mar 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Time synchronization of joins in KS isn't done for GlobalKTables (and interactive queries?) anyway 🤔

 Another important difference between a KTable and a GlobalKTable is time synchronization: while processing KTable records is time synchronized based on record timestamps to all other streams, a GlobalKTable is not time synchronized.

https://www.confluent.io/blog/crossing-streams-joins-apache-kafka/

Do you know if there is any way to make sure that a GlobalKTable has been populated before parallel-consumer processing starts to avoid/minimize chance of join-misses? GlobalKTables are populated in a separate thread. It's usually super-fast, but have had some issues with join-misses in KS for stream-globaltable joins during startup if the globaltable is very large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GKT work a bit different.. they "fully" hydrate on application start. As in, head offset is marked for input topic at start, then topic is loaded to reach that point, then it's just whatever hits first. So, there is some effort there - just trying to point out it's different, and should be understood.

Ah yes, trying to sync with PC separately... 🤔 That's a great question. You could do something janky like read the head message, and wait until that can be queried from the GKT... Otherwise, ideally you could hook into an event listener system to know when GTK bootstrap has finished. I guess there isn't anything for this currently. Oh actually - it might be represented in the run state of KS - that's worth looking into. Thanks for the heads up!

WARNING: Performing a join outside of KS relinquishes ordering efforts KS applies to populating each side of the join - i.e. there is no effort to apply any ordering to the corresponding sides of this join.
When joining within KS, this is taken care of for you.
Be careful using this technique if your operation is sensitive to the order in which data is populated in the state store vs arriving from the event stream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astubbs
Copy link
Contributor Author

astubbs commented Mar 17, 2021

Needs some suggestions on synchronising with external KS app as @JorgenRingen says..

@astubbs astubbs force-pushed the master branch 5 times, most recently from 5c841c3 to bc85ba3 Compare March 30, 2021 10:25
@astubbs astubbs marked this pull request as draft July 13, 2021 13:19
@astubbs astubbs force-pushed the master branch 5 times, most recently from 6312a34 to 3ad49c2 Compare September 9, 2021 09:40
@astubbs astubbs force-pushed the master branch 2 times, most recently from 4c62a9f to b5b166f Compare February 16, 2022 17:41
@astubbs astubbs mentioned this pull request May 16, 2022
64 tasks
@eddyv eddyv closed this Jun 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants