This repository was archived by the owner on Aug 23, 2025. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 28
Metadata API Example to analyze CustomSQL prevalence on a Tableau Site #3
Open
grbritz
wants to merge
16
commits into
tableau:master
Choose a base branch
from
grbritz:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
2a8d657
Updates to fix CSV output, support command line prompts, and move to …
e5733e9
full customsql stats script sans Prep customsql
730fd1d
small cleanup on redundant copy and paste
2e9f61f
small comment tweaks
79f3edb
Merge pull request #1 from grbritz/custom-sql-processing-script
grbritz 0eedc83
more error protection for empty sites
98316b7
Added more status reporting to terminal for showing progress on large…
5c11861
Added requirements for use; Disabled SSL checks since script runs on …
ec2b60c
added more error protection for unexpected GraphQL responses for Cust…
5f457dd
customSQL -> Custom SQL
bc3b0fb
Touch ups from code review
487f5fb
CustomSQL -> Custom SQL, part 2
5d1b0c1
CustomSQL -> Custom SQL, part 3
a0b1367
added optional method to use personal access token
6d304b6
added utf-8 encoding to CSV output
4f4a789
Handle multi-line SQL better
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| *.csv | ||
| *.txt | ||
| ./Sample Files/ | ||
| better-csv.py |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| # Overview | ||
|
|
||
| The custom-sql-statistics python script uses the Metadata API to find how much Custom SQL is used on a Tableau Site. It will report statistics of what percentage of Workbooks and Data Sources are using Custom SQL. It also tracks how many Custom SQL queries were not accepted/supported by the Catalog lineage parser. Please refer to the [official documentation for Catalog's support for Custom SQL](https://help.tableau.com/current/pro/desktop/en-us/customsql.htm#tableau-catalog-support-for-custom-sql) for more information on what SQL is intended to be supported by Catalog. | ||
|
|
||
| This script reports its statistics in two ways: | ||
| 1. Outputs a brief summary of how much Custom SQL is on the site and how much was supported. This will be outputted to `customSQL-stats-summary.txt`. | ||
| 2. Outputs data to `customSQL-stats.csv` that can be used for deeper analysis. **Note:** This format includes the actual SQL queries text included in the Data Source or Workbook (it does not include any data returned by the query, just the SQL used.) | ||
|
|
||
| ## Instructions for use | ||
| To call the script, please use the latest version of python3 and call the script like so: | ||
| ``` | ||
| python3 custom-sql-statistics.py --server <http://example.com> --username <your server/site username> --sitename <only needed for non-default sites> | ||
| ``` | ||
| The scripts results will be outputted to two files `customSQL-stats-summary.txt` which is summary of the data, and `customSQL-stats.csv` which gives row-level data on the results. The latter can be used in Tableau for more thorough analysis. | ||
|
|
||
| For best results, run this script as a site or server admin so that full query data can be returned. If run as a non-admin, the `query_string` examples may be blank due to permissions. | ||
|
|
||
| Note that for a large server with a lot of Custom SQL usage, this script may take a long time to run since it needs to iterate through all content in the server through the Metadata API. | ||
|
|
||
|
|
||
| ## Requirements | ||
| * Tableau Server version must be 2020.1 or greater. | ||
| * Metadata Services must be enabled on the server: [Enable Metadata Services](https://help.tableau.com/current/server/en-us/dm_catalog_enable.htm#step6) | ||
| * This script uses Python 3 and was tested on Python 3.7 | ||
| * This script relies on the [Tableau Server Client python library](https://github.com/tableau/server-client-python) | ||
|
|
||
|
|
||
| ### Known limitations | ||
| This script does not include Custom SQL used from Prep flows. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,283 @@ | ||
| #### | ||
| # This script can be used for getting more information about Custom SQL prevelance on a Tableau Server/Site. | ||
| # | ||
| # This script was written on Python 3.7.6 and was not tested to work on other versions of Python. | ||
| #### | ||
|
|
||
|
|
||
| import argparse | ||
| import getpass | ||
| import logging | ||
| import requests | ||
| import csv | ||
| from datetime import time | ||
| import tableauserverclient as TSC | ||
|
|
||
|
|
||
| ### The below two lines are to suppress SSL warnings. Oftentimes this script is used | ||
| ### on servers within a corporate firewall that have their own self-signed SSL certificates. | ||
| ### Those self-signed tickets will throw errors without this suppression. Comment out these lines | ||
| ### if you do not want to suppress SSL warnings, comment out these lines. | ||
| from requests.packages.urllib3.exceptions import InsecureRequestWarning | ||
| requests.packages.urllib3.disable_warnings(InsecureRequestWarning) | ||
|
|
||
|
|
||
|
|
||
|
|
||
| #See more examples here https://help.tableau.com/current/api/metadata_api/en-us/docs/meta_api_examples.html | ||
|
|
||
|
|
||
|
|
||
| def main(): | ||
| parser = argparse.ArgumentParser(description='Reports on Custom SQL statistics in the Catalog graph. Outputs data into CSV files for reporting.') | ||
| parser.add_argument('--server', '-s', required=True, help='server address (include "http(s)://")') | ||
| parser.add_argument('--logging-level', '-l', choices=['debug', 'info', 'error'], default='error', | ||
| help='desired logging level (set to error by default)') | ||
| parser.add_argument('--sitename', '-site', default=None, help='Sitename (or contentURL) to process Custom SQL Statistics for. This is optional and defaults to the `Default` site. Go here for more information: https://tableau.github.io/server-client-python/docs/sign-in-out') | ||
|
|
||
| group = parser.add_mutually_exclusive_group(required=True) | ||
| group.add_argument('--username', '-u', help='username to sign into server') | ||
| group.add_argument('--token-name', '-token', help='name of the personal access token used to sign into the server') | ||
| args = parser.parse_args() | ||
|
|
||
| # Set logging level based on user input, or error by default | ||
| logging_level = getattr(logging, args.logging_level.upper()) | ||
| logging.basicConfig(level=logging_level) | ||
|
|
||
|
|
||
| server = TSC.Server(args.server) | ||
| server.add_http_options({"verify": False}) | ||
| server.use_server_version() | ||
|
|
||
| if args.username: | ||
| ## Sign in with UN/Pass | ||
| password = getpass.getpass("Password: ") | ||
| tableau_auth = TSC.TableauAuth(args.username, password, args.sitename) | ||
| server.auth.sign_in(tableau_auth) | ||
|
|
||
| else: | ||
| ## Sign in with personal access token | ||
| personal_access_token = input("Personal Access Token: ") | ||
| tableau_auth = TSC.PersonalAccessTokenAuth(token_name=args.token_name, personal_access_token=personal_access_token, site_id=args.sitename) | ||
| server.auth.sign_in_with_personal_access_token(tableau_auth) | ||
|
|
||
|
|
||
| logging.debug("Signed into Server") | ||
|
|
||
| query = """ | ||
| { | ||
| customSQLTablesConnection(first: 20, after: AFTER_TOKEN_SIGNAL) { | ||
| nodes { | ||
| id | ||
|
|
||
| database { | ||
| connectionType | ||
| } | ||
|
|
||
| tables { | ||
| id | ||
| } | ||
|
|
||
| query | ||
|
|
||
| columns { | ||
| workbooks_directly_connected: referencedByFields { | ||
| datasource { | ||
| ... on EmbeddedDatasource { | ||
| workbook { | ||
| name | ||
| id | ||
| } | ||
| } | ||
| } | ||
| } | ||
| datasources_directly_connected: referencedByFields { | ||
| datasource { | ||
| ... on PublishedDatasource { | ||
| name | ||
| id | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| pageInfo { | ||
| hasNextPage | ||
| endCursor | ||
| } | ||
| } | ||
| } | ||
| """ | ||
| print("--------------------------\nBeginning to query information about Custom SQL ables on this site...") | ||
| resp = server.metadata.query(query.replace('AFTER_TOKEN_SIGNAL', 'null')) | ||
| workbooks = {} | ||
| datasources = {} | ||
| table_stats = {'num_no_columns': 0, 'num_no_database': 0, 'num_no_workbooks_or_ds_connected': 0, 'num_tables_seen': 0, 'num_failed_parse': 0, 'num_no_connected_tables':0} | ||
| while True: | ||
| process_page(resp, workbooks, datasources, table_stats) | ||
|
|
||
| page_info = resp['data']['customSQLTablesConnection']['pageInfo'] | ||
|
|
||
| print("--------------------------\n Processing update:") | ||
| print(table_stats) | ||
| if page_info['hasNextPage']: | ||
| resp = server.metadata.query(query.replace('AFTER_TOKEN_SIGNAL', '"' + page_info['endCursor'] + '"')) | ||
| else: | ||
| break | ||
|
|
||
|
|
||
|
|
||
| total_skipped = table_stats['num_no_columns'] + table_stats['num_no_database'] + table_stats['num_no_workbooks_or_ds_connected'] | ||
| logging.debug("{} Custom SQL tables were skipped due to unexpected data".format(total_skipped)) | ||
| totalCountsQuery = """ | ||
| { | ||
| total_workbooks_count: workbooksConnection { totalCount } | ||
| total_datasources_count: publishedDatasourcesConnection { totalCount } | ||
| } | ||
| """ | ||
| resp = server.metadata.query(totalCountsQuery) | ||
| total_workbooks = resp['data']['total_workbooks_count']['totalCount'] | ||
| total_datasources = resp['data']['total_datasources_count']['totalCount'] | ||
|
|
||
|
|
||
| print("--------------------------\nFinished processing Custom SQL tables on this site... Writing to results files now") | ||
|
|
||
| ## Outputting summary to customSQL-stats-summary.txt file | ||
| with open("./customSQL-stats-summary.txt", 'w', newline='') as file: | ||
|
|
||
| print(table_stats, file=file) | ||
| print("Total # of Custom SQL tables on site={} and {} of them ({:.2f}%) were not parsed by Catalog".format(table_stats['num_tables_seen'], table_stats['num_failed_parse'], percentify(safe_divide(table_stats['num_failed_parse'], table_stats['num_tables_seen']))), file=file) | ||
| print("Total # of Workbooks on Site={}".format(total_workbooks), file=file) | ||
| print("# of Workbooks using Custom SQL={} ({:.2f}% of total)".format(len(workbooks), percentify(safe_divide(len(workbooks), total_workbooks))), file=file) | ||
|
|
||
| print("Total # of Published Data Sources on Site={}".format(total_datasources), file=file) | ||
| print("# of Published Data Sources using Custom SQL={} ({:.2f}% of total)".format(len(datasources), percentify(safe_divide(len(datasources), total_datasources))), file=file) | ||
|
|
||
|
|
||
| ## Outputting data to CSV file | ||
| filename='./customSQL-stats.csv' | ||
| with open(filename, 'w', newline='', encoding='utf-8') as file: | ||
| csv_writer = csv.writer(file) | ||
|
|
||
| columnHeaders = ['parent_content_type', 'parent_content_graph_id', 'custom_sql_graph_id', 'sql_failed_to_parse', 'query_string', 'database_type'] | ||
| csv_writer.writerow(columnHeaders) | ||
|
|
||
| serialize_to_csv(csv_writer, workbooks, 'workbook') | ||
| serialize_to_csv(csv_writer, datasources, 'published datasource') | ||
|
|
||
|
|
||
| def safe_divide(num, denom): | ||
| return num / denom if denom else 0 | ||
|
|
||
| # Serializes info to a CSV file | ||
| def serialize_to_csv(writer, collection, content_type): | ||
| ## Create a row per each customSQL table in each worbkook or data source | ||
| for content_item_id in collection.keys(): | ||
| for cust_sql_table_id in collection[content_item_id]['customSQLTables'].keys(): | ||
| cust_sql_table = collection[content_item_id]['customSQLTables'][cust_sql_table_id] | ||
|
|
||
| new_row = [content_type] | ||
| new_row.append(content_item_id) | ||
| new_row.append(cust_sql_table_id) | ||
| new_row.append(cust_sql_table['sql_failed_to_parse']) | ||
| sql = replace_newlines(cust_sql_table['query_string']) | ||
| new_row.append(sql) | ||
| new_row.append(cust_sql_table['database_type']) | ||
|
|
||
| writer.writerow(new_row) | ||
|
|
||
|
|
||
|
|
||
| def percentify(decimal): | ||
| return decimal * 100 | ||
|
|
||
| ### Returns string where newline patterns are replaced with a triple tab | ||
| ### so that Tableau's CSV connector connects to the output file more easily. | ||
| def replace_newlines(sqlstring): | ||
| sql = sqlstring.replace('\r\n', '\t\t\t') | ||
| sql = sql.replace('\n', '\t\t\t') | ||
| sql = sql.replace('\r', '\t\t\t') | ||
|
|
||
| return sql | ||
|
|
||
|
|
||
| ## Used to check if the JSON returned for this CustomSQL Table matches | ||
| ## all of the expectations. Some tables are returned malformatted due to ingestion errors or delays. | ||
| ## Returns true if the table can be processed, false if it should be skipped. | ||
| def can_process_table(customSQLTable, table_stats): | ||
| if len(customSQLTable['columns']) == 0: | ||
| logging.debug("Table {} has no columns and will be skipped".format(customSQLTable['id'])) | ||
| table_stats['num_no_columns'] += 1 | ||
| return False | ||
|
|
||
| if len(customSQLTable['columns'][0]['workbooks_directly_connected']) == 0 and len(customSQLTable['columns'][0]['datasources_directly_connected']) == 0: | ||
| logging.debug("Table {} has nothing in either `workbooks_directly_connected` or `datasources_directly_connected`, so it will be skipped".format(customSQLTable['id'])) | ||
| table_stats['num_no_workbooks_or_ds_connected'] += 1 | ||
| return False | ||
|
|
||
|
|
||
| if ('database' not in customSQLTable) or customSQLTable['database'] == None: | ||
| logging.debug("Table {} is missing the database reference and will be skipped".format(customSQLTable['id'])) | ||
| table_stats['num_no_database'] += 1 | ||
| return False | ||
|
|
||
| if customSQLTable['tables'] == None: | ||
| logging.debug("Table {} has None for tables directly referenced and will be skipped".format(customSQLTable['id'])) | ||
| table_stats['num_no_connected_tables'] += 1 | ||
| return False | ||
|
|
||
| return True | ||
|
|
||
|
|
||
|
|
||
| def process_page(response, workbooks, datasources, table_stats): | ||
| customSQLTables = response['data']['customSQLTablesConnection']['nodes'] | ||
|
|
||
| for table in customSQLTables: | ||
| table_stats['num_tables_seen'] += 1 | ||
| table_stats['num_failed_parse'] += 1 if has_failed_sql(table) else 0 | ||
|
|
||
| if not can_process_table(table, table_stats): | ||
| continue | ||
|
|
||
|
|
||
| ## this is CustomSQLTable connecting to a WB | ||
| if bool(table['columns'][0]['workbooks_directly_connected'][0]['datasource']): | ||
| object_id = table['columns'][0]['workbooks_directly_connected'][0]['datasource']['workbook']['id'] | ||
| process_table_for_collection(table, object_id, workbooks) | ||
| ## This is a CustomSQLTable connecting to a PDS | ||
| else: | ||
| object_id = table['columns'][0]['datasources_directly_connected'][0]['datasource']['id'] | ||
| process_table_for_collection(table, object_id, datasources) | ||
|
|
||
|
|
||
| def process_table_for_collection(table, object_id, collection): | ||
|
|
||
| ## This is first time we've seen this workbook | ||
| if object_id not in collection: | ||
| collection[object_id] = {} | ||
| collection[object_id]['customSQLTables'] = {} | ||
| collection[object_id]['customSQLTables'][table['id']] = {} | ||
| extract_sql_table_info(table, collection[object_id]['customSQLTables'][table['id']]) | ||
| else: | ||
| if table['id'] in collection[object_id]['customSQLTables']: | ||
| logging.debug('Seeing same CustomSQLTable twice. Skipping adding to dictionary. Table ID: {}'.format(table['id'])) | ||
| else: | ||
| collection[object_id]['customSQLTables'][table['id']] = {} | ||
| extract_sql_table_info(table, collection[object_id]['customSQLTables'][table['id']]) | ||
|
|
||
| logging.info("Processed table id={} and added to collection".format(table['id'])) | ||
|
|
||
|
|
||
| def extract_sql_table_info(source_table_dict, dest_table_dict): | ||
| dest_table_dict['sql_failed_to_parse'] = has_failed_sql(source_table_dict) | ||
| dest_table_dict['query_string'] = source_table_dict['query'] | ||
| dest_table_dict['database_type'] = source_table_dict['database']['connectionType'] | ||
|
|
||
|
|
||
| def has_failed_sql(table): | ||
| return False if len(table['tables']) > 0 else True | ||
|
|
||
| if __name__ == '__main__': | ||
| main() | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about me merging my graphql pager so you can use it -- Don't have to block on that for this review but reminder to both us us that I said I'd do that :P