This repository enables users to check if assets stored within Databricks Unity Catalog are stored within a specific location in cloud object storage. The list of assets supported include:
- Managed Catalogs
- Managed Schemas
- Managed Tables
- External Tables
Tip: To generically export the storage locations for all assets, set external_location = ""
This gets all Managed Catalogs that have external_location as a part of their root path.
To run this function succesfully, the user running the function needs to have USE CATALOG permissions in UC for the catalogs being checked, and SELECT permissions to the system.information_schema.catalogs.
external_location = "abfss"
get_managed_catalogs(external_location)
This gets all Managed Schemas that have external_location as a part of their root path.
To run this function succesfully, the user running the function needs to have USE SCHEMA permissions in UC for the schemas being checked, and SELECT permissions to the system.information_schema.schemata.
To check Managed Schemas in a given Catalog:
get_managed_schemas(external_loc=external_location, catalog="gshen_catalog")
To check Managed Schemas across all catalogs:
get_managed_schemas(external_loc=external_location)
If you do not have access to system.information_schema.schemata you can use the system_table parameter to switch to a information_schema the user does have access to:
get_table_paths(external_loc=external_location, catalog="gshen_catalog",system_table = "gshen_catalog.information_schema.schemata")
This gets all Managed and External tables that have external_location as a part of their root path.
To run this function succesfully, the user running the function needs to have SELECT permissions in UC for the tables being checked, and SELECT permissions to the system.information_schema.tables.
To check External Tables in a given catalog:
get_table_paths(external_loc=external_location, catalog="gshen_catalog", table_type="EXTERNAL")
To check Managed Tables in a given catalog and schema:
get_table_paths(external_loc=external_location, catalog="gshen_catalog", schema ="data_blending", table_type="MANAGED")
To check External and Managed Tables across all catalogs and schemas :
get_table_paths(external_loc=external_location)
If you do not have access to `system.information_schema.tables`, you can use the `system_table` parameter to switch to a `information_schema` the user does have access to:
get_table_paths(external_loc=external_location, catalog="gshen_catalog",system_table = "gshen_catalog.information_schema.tables")