- dbt buildmeans building this project wide open.
- Other queries mentioned in the blog are found in the analysis folder.
- Create warehouse scripts are also found in analysis folder.
- note: analysis files are missing .sql extension so dbt does not parse them.
This repo was forked from https://github.com/clausherther/dbt-tpch and edited so it runs with modern version of dbt.
His readme is below
This is a dbt sample project for Snowflake using the TPC-H example dataset that ships as a shared database with Snowflake.
More details can be found on the TPC website and in the specification document.
The project is laid out as follows:
- _schemacontains schema defintions and tests in one- .ymlfile per model
- _sourcecontains source definitions
- basecontains- ephemeralbase models that serve as wrappers around source models to define column names and data types where necessary
- odsrepresents an Operational Data Store (ODS), i.e. a mostly normalized view of the data. These models may contain more columns than we may choose to publish to the dimensional data warehouse, but don't contain any reporting models.
- whrepresents the Dimensional Data Warehouse (WH). These models use a star schema methodology made up of fact (- fct_*) and dimension (- dim_*) tables. In addition, this schema contains report models (- rpt_*) that combine fact and dimension tables for business reporting.
- Add a new profile to ~/.dbt/profiles.ymlcalledtpch.
tpch:
    target: dev
    outputs:
        prod:
            type: snowflake
            threads: 8
            account: <account>
            user: <user>
            password: <password>
            role: <role>
            database: <target_database>
            warehouse: <snowflake_warehouse>
            schema: <default_schema>
        dev:
            type: snowflake
            threads: 8
            account: <account>
            user: <user>
            password: <password>
            role: <role>
            database: <target_database>
            warehouse: <snowflake_warehouse>
            schema: <default_schema>
This project make use of the dbt_utils package, so you will need to call dbt deps before running any model to ensure dbt can combile all package macros.
Also, note that you can change the scaling factor of the TPCH dataset by switching the source database in _source/source_tpch.yml from the default of 10 to 100 or 1000 by changing the database name accordingly.
version: 2
sources:
  - name: tpch
    database: SNOWFLAKE_SAMPLE_DATA
    schema: TPCH_SF10
    loader: Snowflake
...
Using an X-Small warehouse (1 credit / hour), the project currently runs in about 5 minutes against the TPCH_SF10 database.
- What is dbt?
- Read the dbt viewpoint
- Installation
- Join the chat on Slack for live questions and support.