Skip to content

Conversation

georgesittas
Copy link
Contributor

@georgesittas georgesittas commented Apr 25, 2025

Fixes #4252

I reproduced the issue locally by following the steps outlined in the above issue:

>>> ctx.models['"db"."sqlmesh_example"."incremental_model"'].json()
'{"name":"sqlmesh_example.incremental_model","project":"","start":"2020-01-01","cron":"@daily","tags":[],"dialect":"duckdb","kind":{"name":"INCREMENTAL_BY_TIME_RANGE","on_destructive_change":"ERROR","dialect":"duckdb","forward_only":false,"disable_restatement":false,"time_column":{"column":"\\"event_date\\"","format":"%Y-%m-%d"},"partition_by_time_column":true},"partitioned_by":[],"clustered_by":[],"default_catalog":"db","audits":[],"grains":["(id, event_date)"],"references":[],"allow_partials":false,"signals":[],"enabled":true,"python_env":{},"jinja_macros":{"packages":{},"root_macros":{},"global_objs":{},"create_builtins_module":"sqlmesh.utils.jinja","top_level_packages":[]},"audit_definitions":{},"mapping_schema":{"\\"db\\"":{"\\"sqlmesh_example\\"":{"\\"seed_model\\"":{"id":"INT","item_id":"INT","event_date":"DATE"}}}},"extract_dependencies_from_query":true,"pre_statements":[],"post_statements":["SELECT\\n  id,\\n  item_id,\\n  event_date,\\nFROM\\n  sqlmesh_example.seed_model\\nWHERE\\n  event_date BETWEEN @start_date AND @end_date\\n;"],"on_virtual_update":[],"query":"SELECT\\n  id,\\n  item_id,\\n  event_date,\\nFROM\\n  sqlmesh_example.seed_model\\nWHERE\\n  event_date BETWEEN @start_date AND @end_date\\n;","source_type":"sql"}'

I observed that we hit this line for the Semicolon expression and store the query's SQL in its meta dict. This explains the duplication of the model's query in the above serialized post_statements key.

My initial approach was to avoid setting meta["sql"] for Semicolon expressions, but then it occurred to me that we shouldn't be storing any Semicolon expressions in the model, because they can also affect the data/metadata hash.

I verified for the issue's examples that if I ran plan, removed the semicolon+comment, and then ran plan again then a diff was shown. This is because Semicolon is gen'd since it's in the post_statements list and we get back a "SEMICOLON" string in the _data_hash_values. I added a test for this.

I think we need a migration for this change. I'll take it to the finish line on Monday soon.

@georgesittas georgesittas requested review from tobymao and a team April 25, 2025 22:53
@georgesittas georgesittas force-pushed the jo/fix_semicolon_bug branch from f6fdb9b to 8a0c508 Compare April 25, 2025 22:55
@georgesittas georgesittas force-pushed the jo/fix_semicolon_bug branch 6 times, most recently from ba228ab to d9a2ff8 Compare May 5, 2025 13:33
@georgesittas georgesittas force-pushed the jo/fix_semicolon_bug branch from d9a2ff8 to 4ba0256 Compare May 5, 2025 21:25
@georgesittas georgesittas requested a review from izeigerman May 5, 2025 21:27
@georgesittas georgesittas merged commit 3117d76 into main May 6, 2025
23 checks passed
@georgesittas georgesittas deleted the jo/fix_semicolon_bug branch May 6, 2025 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Comment at end of model script results in model query to be included in post statements
2 participants