Skip to content

Conversation

@sunchao
Copy link
Member

@sunchao sunchao commented Jan 22, 2021

This is a backport of #31107 to branch-3.1.

What changes were proposed in this pull request?

This passes original SQL text to CacheTableCommand command in DSv1 so that it will be stored instead of the analyzed logical plan, similar to CREATE VIEW command.

In addition, this changes the behavior of dropping temporary view to also invalidate dependent caches in a cascade, when the config SQLConf.STORE_ANALYZED_PLAN_FOR_VIEW is false (which is the default value).

Why are the changes needed?

Currently, after creating a temporary view with CACHE TABLE ... AS SELECT command, the view can still be queried even after the source table is dropped or replaced (in v2). This can cause correctness issue.

For instance, in the following:

> CREATE TABLE t ...;
> CACHE TABLE v AS SELECT * FROM t;
> DROP TABLE t;
> SELECT * FROM v;

The last select query still returns the old (and stale) result instead of fail. Note that the cache is already invalidated as part of dropping table t, but the temporary view v still exist.

On the other hand, the following:

> CREATE TABLE t ...;
> CREATE TEMPORARY VIEW v AS SELECT * FROM t;
> CACHE TABLE v;
> DROP TABLE t;
> SELECT * FROM v;

will throw "Table or view not found" error in the last select query.

This is related to #30567 which aligns the behavior of temporary view and global view by storing the original SQL text for temporary view, as opposed to the analyzed logical plan. However, the PR only handles CreateView case but not the CacheTableAsSelect case.

This also changes uncache logic and use cascade invalidation for temporary views created above. This is to align its behavior to how a permanent view is handled as of today, and also to avoid potential issues where a dependent view becomes invalid while its data is still kept in cache.

Does this PR introduce any user-facing change?

Yes, now when SQLConf.STORE_ANALYZED_PLAN_FOR_VIEW is set to false (the default value), whenever a table/permanent view/temp view that a cached view depends on is dropped, the cached view itself will become invalid during analysis, i.e., user will get "Table or view not found" error. In addition, when the dependent is a temp view in the previous case, the cache itself will also be invalidated.

How was this patch tested?

Added new test cases. Also modified and enhanced some existing related tests.

…E TABLE .. AS SELECT"

This passes original SQL text to `CacheTableAsSelect` command in DSv1 and v2 so that it will be stored instead of the analyzed logical plan, similar to `CREATE VIEW` command.

In addition, this changes the behavior of dropping temporary view to also invalidate dependent caches in a cascade, when the config `SQLConf.STORE_ANALYZED_PLAN_FOR_VIEW` is false (which is the default value).

Currently, after creating a temporary view with `CACHE TABLE ... AS SELECT` command, the view can still be queried even after the source table is dropped or replaced (in v2). This can cause correctness issue.

For instance, in the following:
```sql
> CREATE TABLE t ...;
> CACHE TABLE v AS SELECT * FROM t;
> DROP TABLE t;
> SELECT * FROM v;
```
The last select query still returns the old (and stale) result instead of fail. Note that the cache is already invalidated as part of dropping table `t`, but the temporary view `v` still exist.

On the other hand, the following:
```sql
> CREATE TABLE t ...;
> CREATE TEMPORARY VIEW v AS SELECT * FROM t;
> CACHE TABLE v;
> DROP TABLE t;
> SELECT * FROM v;
```
will throw "Table or view not found" error in the last select query.

This is related to apache#30567 which aligns the behavior of temporary view and global view by storing the original SQL text for temporary view, as opposed to the analyzed logical plan. However, the PR only handles `CreateView` case but not the `CacheTableAsSelect` case.

This also changes uncache logic and use cascade invalidation for temporary views created above. This is to align its behavior to how a permanent view is handled as of today, and also to avoid potential issues where a dependent view becomes invalid while its data is still kept in cache.

Yes, now when `SQLConf.STORE_ANALYZED_PLAN_FOR_VIEW` is set to false (the default value), whenever a table/permanent view/temp view that a cached view depends on is dropped, the cached view itself will become invalid during analysis, i.e., user will get "Table or view not found" error. In addition, when the dependent is a temp view in the previous case, the cache itself will also be invalidated.

Modified/Enhanced some existing tests.

Closes apache#31107 from sunchao/SPARK-34052.

Lead-authored-by: Chao Sun <[email protected]>
Co-authored-by: Chao Sun <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@github-actions github-actions bot added the SQL label Jan 22, 2021
@SparkQA
Copy link

SparkQA commented Jan 22, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38971/

@SparkQA
Copy link

SparkQA commented Jan 22, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38971/

@SparkQA
Copy link

SparkQA commented Jan 23, 2021

Test build #134385 has finished for PR 31300 at commit 3a1e4f1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 23, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38978/

@sunchao
Copy link
Member Author

sunchao commented Jan 23, 2021

@cloud-fan @viirya could you review this? thanks!

@SparkQA
Copy link

SparkQA commented Jan 23, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/38978/

@SparkQA
Copy link

SparkQA commented Jan 23, 2021

Test build #134392 has finished for PR 31300 at commit 57a07ac.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member

Merged to branch-3.1

HyukjinKwon pushed a commit that referenced this pull request Jan 24, 2021
…"CACHE TABLE .. AS SELECT ..."

This is a backport of #31107 to branch-3.1.

### What changes were proposed in this pull request?

This passes original SQL text to `CacheTableCommand` command in DSv1 so that it will be stored instead of the analyzed logical plan, similar to `CREATE VIEW` command.

In addition, this changes the behavior of dropping temporary view to also invalidate dependent caches in a cascade, when the config `SQLConf.STORE_ANALYZED_PLAN_FOR_VIEW` is false (which is the default value).

### Why are the changes needed?

Currently, after creating a temporary view with `CACHE TABLE ... AS SELECT` command, the view can still be queried even after the source table is dropped or replaced (in v2). This can cause correctness issue.

For instance, in the following:
```sql
> CREATE TABLE t ...;
> CACHE TABLE v AS SELECT * FROM t;
> DROP TABLE t;
> SELECT * FROM v;
```
The last select query still returns the old (and stale) result instead of fail. Note that the cache is already invalidated as part of dropping table `t`, but the temporary view `v` still exist.

On the other hand, the following:
```sql
> CREATE TABLE t ...;
> CREATE TEMPORARY VIEW v AS SELECT * FROM t;
> CACHE TABLE v;
> DROP TABLE t;
> SELECT * FROM v;
```
will throw "Table or view not found" error in the last select query.

This is related to #30567 which aligns the behavior of temporary view and global view by storing the original SQL text for temporary view, as opposed to the analyzed logical plan. However, the PR only handles `CreateView` case but not the `CacheTableAsSelect` case.

This also changes uncache logic and use cascade invalidation for temporary views created above. This is to align its behavior to how a permanent view is handled as of today, and also to avoid potential issues where a dependent view becomes invalid while its data is still kept in cache.

### Does this PR introduce _any_ user-facing change?

Yes, now when `SQLConf.STORE_ANALYZED_PLAN_FOR_VIEW` is set to false (the default value), whenever a table/permanent view/temp view that a cached view depends on is dropped, the cached view itself will become invalid during analysis, i.e., user will get "Table or view not found" error. In addition, when the dependent is a temp view in the previous case, the cache itself will also be invalidated.

### How was this patch tested?

Added new test cases. Also modified and enhanced some existing related tests.

Closes #31300 from sunchao/SPARK-34052-branch-3.1.

Authored-by: Chao Sun <[email protected]>
Signed-off-by: HyukjinKwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants