Skip to content

Conversation

@Ngone51
Copy link
Member

@Ngone51 Ngone51 commented Jan 6, 2021

What changes were proposed in this pull request?

This PR makes sure that SharedState would be only initiated once across SparkSessions.

Why are the changes needed?

This helps us to fix the listener(e.g., SQLAppStatusListener) leak issue, and allow SparkSessions created by getOrCreate to share the CacheManager, global temp view, etc.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added UTs.

@Ngone51
Copy link
Member Author

Ngone51 commented Jan 6, 2021

@cloud-fan @vinooganesh @advancedxy Please take a look, thanks!

@SparkQA
Copy link

SparkQA commented Jan 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38301/

@SparkQA
Copy link

SparkQA commented Jan 6, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38301/

@SparkQA
Copy link

SparkQA commented Jan 6, 2021

Test build #133713 has finished for PR 31053 at commit 38379d0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

seems we can also fix this issue by making sure there is only one SharedState instance. cc @yaooqinn

@yaooqinn
Copy link
Member

yaooqinn commented Jan 6, 2021

seems we can also fix this issue by making sure there is only one SharedState instance. cc @yaooqinn

right

@Ngone51
Copy link
Member Author

Ngone51 commented Jan 7, 2021

seems we can also fix this issue by making sure there is only one SharedState instance.

Seems like the SharedState, currently, is only shared between parent and child sessions. e.g., newSession() or cloneSession()...

If we want to make sure there's only one SharedState instance in this case, we need to make SparkSession.builder.getOrCreate to share the SharedState instance between the sibling sessions. Would it be ok?

Besides, although the only one SharedState instance can fixe SQLAppStatusListener leak, it leaves the ApplicationEndListener being fixed in another way(#28128). And addToQueueIfNotExists() could fix them in a more general way.

Thoughts?

@cloud-fan
Copy link
Contributor

I think it's an accident that the SharedState can be created more than once, as it holds the ExternalCatalog instance. If we fix that first, then this PR is just a refactor, as it doesn't fix any issues.

@vinooganesh
Copy link
Contributor

I agree with @cloud-fan and think it's easier just to ensure that SharedState can only be created once. Trying to compare listeners with their fully qualified class name seems more problematic than necessary (especially when there are so many different workflows that people use when setting up their listeners - including ones where different instances of a listener with the same fully qualified class name exist)

@Ngone51 Ngone51 changed the title [SPARK-32165][SQL][CORE] Add addToQueueIfNotExists to ensure some SQL listeners only register once across SparkSessions [SPARK-32165][SQL] Ensure Spark only initiates SharedState once across SparkSessions Jan 13, 2021
@Ngone51
Copy link
Member Author

Ngone51 commented Jan 13, 2021

Hey guys, I've updated the PR with the one SharedState way. Please take another look, thanks!

@SparkQA
Copy link

SparkQA commented Jan 13, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38595/

@SparkQA
Copy link

SparkQA commented Jan 13, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38595/

@SparkQA
Copy link

SparkQA commented Jan 13, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38600/

@SparkQA
Copy link

SparkQA commented Jan 13, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38600/

@SparkQA
Copy link

SparkQA commented Jan 13, 2021

Test build #134013 has finished for PR 31053 at commit 0d4c7d1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2021

Test build #134007 has finished for PR 31053 at commit 88bbd39.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 14, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38622/

@SparkQA
Copy link

SparkQA commented Jan 14, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38622/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it duplicated with checking assert(countListener("ExecutionListenerBus", context) === 2)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. Here we want to ensure other potential listeners won't leak across SparkSessions.

Copy link
Contributor

@jaceklaskowski jaceklaskowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (non-binding)

@SparkQA
Copy link

SparkQA commented Jan 14, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38637/

@SparkQA
Copy link

SparkQA commented Jan 14, 2021

Test build #134035 has finished for PR 31053 at commit 4bc644d.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 16, 2021

Test build #134134 has finished for PR 31053 at commit 9c32d18.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 16, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38717/

@SparkQA
Copy link

SparkQA commented Jan 16, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38717/

@Ngone51
Copy link
Member Author

Ngone51 commented Jan 19, 2021

retest this please

@SparkQA
Copy link

SparkQA commented Jan 19, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38795/

@SparkQA
Copy link

SparkQA commented Jan 19, 2021

Test build #134210 has finished for PR 31053 at commit 9c32d18.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 19, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38795/

@Ngone51 Ngone51 closed this Jan 20, 2021
@Ngone51 Ngone51 reopened this Jan 20, 2021
@Ngone51
Copy link
Member Author

Ngone51 commented Jan 20, 2021

retest this please

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38832/

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Test build #134247 has finished for PR 31053 at commit 9c32d18.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@maropu
Copy link
Member

maropu commented Jan 20, 2021

retest this please

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38832/

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38839/

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38839/

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Test build #134253 has finished for PR 31053 at commit 9c32d18.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@cloud-fan
Copy link
Contributor

retest this please

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38863/

@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38863/

@Ngone51 Ngone51 closed this Jan 20, 2021
@Ngone51 Ngone51 reopened this Jan 20, 2021
@SparkQA
Copy link

SparkQA commented Jan 20, 2021

Test build #134275 has finished for PR 31053 at commit 9c32d18.

  • This patch fails from timeout after a configured wait of 500m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@github-actions
Copy link

github-actions bot commented May 1, 2021

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label May 1, 2021
@github-actions github-actions bot closed this May 2, 2021
@dnskr
Copy link

dnskr commented Jan 16, 2022

@Ngone51 @cloud-fan Hi! The issue still exists. Can we reopen this PR? Otherwise please provide any comments why this cannot be merged or what should be done first?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants