Skip to content

Conversation

@imback82
Copy link
Contributor

What changes were proposed in this pull request?

This is a back port of #30774.

This PR proposes to retain the cache's storage level when a table name is altered by ALTER TABLE ... RENAME TO ....

Why are the changes needed?

Currently, when a table name is altered, the table's cache is refreshed (if exists), but the storage level is not retained. For example:

        def getStorageLevel(tableName: String): StorageLevel = {
          val table = spark.table(tableName)
          val cachedData = spark.sharedState.cacheManager.lookupCachedData(table).get
          cachedData.cachedRepresentation.cacheBuilder.storageLevel
        }

        Seq(1 -> "a").toDF("i", "j").write.parquet(path.getCanonicalPath)
        sql(s"CREATE TABLE old USING parquet LOCATION '${path.toURI}'")
        sql("CACHE TABLE old OPTIONS('storageLevel' 'MEMORY_ONLY')")
        val oldStorageLevel = getStorageLevel("old")

        sql("ALTER TABLE old RENAME TO new")
        val newStorageLevel = getStorageLevel("new")

oldStorageLevel will be StorageLevel(memory, deserialized, 1 replicas) whereas newStorageLevel will be StorageLevel(disk, memory, deserialized, 1 replicas), which is the default storage level.

Does this PR introduce any user-facing change?

Yes, now the storage level for the cache will be retained.

How was this patch tested?

Added a unit test.

@imback82
Copy link
Contributor Author

cc @cloud-fan

@SparkQA
Copy link

SparkQA commented Dec 16, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37465/

@SparkQA
Copy link

SparkQA commented Dec 16, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/37465/

@SparkQA
Copy link

SparkQA commented Dec 16, 2020

Test build #132863 has finished for PR 30793 at commit 01aee31.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/132863/

@cloud-fan
Copy link
Contributor

thanks, merging to 3.0!

cloud-fan pushed a commit that referenced this pull request Dec 16, 2020
…cted when a table name is altered

### What changes were proposed in this pull request?

This is a back port of #30774.

This PR proposes to retain the cache's storage level when a table name is altered by `ALTER TABLE ... RENAME TO ...`.

### Why are the changes needed?

Currently, when a table name is altered, the table's cache is refreshed (if exists), but the storage level is not retained. For example:
```scala
        def getStorageLevel(tableName: String): StorageLevel = {
          val table = spark.table(tableName)
          val cachedData = spark.sharedState.cacheManager.lookupCachedData(table).get
          cachedData.cachedRepresentation.cacheBuilder.storageLevel
        }

        Seq(1 -> "a").toDF("i", "j").write.parquet(path.getCanonicalPath)
        sql(s"CREATE TABLE old USING parquet LOCATION '${path.toURI}'")
        sql("CACHE TABLE old OPTIONS('storageLevel' 'MEMORY_ONLY')")
        val oldStorageLevel = getStorageLevel("old")

        sql("ALTER TABLE old RENAME TO new")
        val newStorageLevel = getStorageLevel("new")
```
`oldStorageLevel` will be `StorageLevel(memory, deserialized, 1 replicas)` whereas `newStorageLevel` will be `StorageLevel(disk, memory, deserialized, 1 replicas)`, which is the default storage level.

### Does this PR introduce _any_ user-facing change?

Yes, now the storage level for the cache will be retained.

### How was this patch tested?

Added a unit test.

Closes #30793 from imback82/alter_table_rename_cache_fix_3.0.

Authored-by: Terry Kim <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@cloud-fan cloud-fan closed this Dec 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants