Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Oct 7, 2024

Which issue does this PR close?

Follow on to #12759

Rationale for this change

@comphead and myself wondered why a particular code path existed, and @berkaysynnada helped find the answer: #12759 (comment)

Let's encode that learning into comments so we don't have to re-discover it the next time we read the code.

What changes are included in this PR?

Add comments

Are these changes tested?

N/A

Are there any user-facing changes?

No, internal comments only

@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Oct 7, 2024
@alamb
Copy link
Contributor Author

alamb commented Oct 7, 2024

I think CI / clippy will be fixed by #12724

Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alamb for the clarification

@berkaysynnada berkaysynnada merged commit d8405ba into apache:main Oct 8, 2024
24 checks passed
@alamb alamb deleted the alamb/clarify_properties branch October 8, 2024 10:36
@alamb
Copy link
Contributor Author

alamb commented Oct 8, 2024

Thanks for the review @berkaysynnada

alamb added a commit that referenced this pull request Oct 8, 2024
* Add support for external tables with qualified names (#12645)

* Make  support schemas

* Set default name to table

* Remove print statements and stale comment

* Add tests for create table

* Fix typo

* Update datafusion/sql/src/statement.rs

Co-authored-by: Jonah Gao <[email protected]>

* convert create_external_table to objectname

* Add sqllogic tests

* Fix failing tests

---------

Co-authored-by: Jonah Gao <[email protected]>

* Fix Regex signature types (#12690)

* Fix Regex signature types

* Uncomment the shared tests in string_query.slt.part and removed tests copies everywhere else

* Test `LIKE` and `MATCH` with flags; Remove new tests from regexp.slt

* Refactor `ByteGroupValueBuilder` to use `MaybeNullBufferBuilder` (#12681)

* Fix malformed hex string literal in docs (#12708)

* Simplify match patterns in coercion rules (#12711)

Remove conditions where unnecessary.
Refactor to improve readability.

* Remove aggregate functions dependency on frontend (#12715)

* Remove aggregate functions dependency on frontend

DataFusion is a SQL query engine and also a reusable library for
building query engines. The core functionality should not depend on
frontend related functionalities like `sqlparser` or `datafusion-sql`.

* Remove duplicate license header

* Minor: Remove clone in `transform_to_states` (#12707)

* rm clone

Signed-off-by: jayzhan211 <[email protected]>

* fmt

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>

* Refactor tests for union sorting properties, add tests for unions and constants (#12702)

* Refactor tests for union sorting properties

* update doc test

* Undo import reordering

* remove unecessary static lifetimes

* Fix: support Qualified Wildcard in count aggregate function (#12673)

* Reduce code duplication in `PrimitiveGroupValueBuilder` with const generics (#12703)

* Reduce code duplication in `PrimitiveGroupValueBuilder` with const generics

* Fix docs

* Disallow duplicated qualified field names (#12608)

* Disallow duplicated qualified field names

* Fix tests

* Optimize base64/hex decoding by pre-allocating output buffers (~2x faster) (#12675)

* add bench

* replace macro with generic function

* remove duplicated code

* optimize base64/hex decode

* Allow DynamicFileCatalog support to query partitioned file (#12683)

* support to query partitioned table for dynamic file catalog

* cargo clippy

* split partitions inferring to another function

* Support `LIMIT` Push-down logical plan optimization for `Extension` nodes (#12685)

* Update trait `UserDefinedLogicalNodeCore`

Signed-off-by: Austin Liu <[email protected]>

* Update corresponding interface

Signed-off-by: Austin Liu <[email protected]>

Add rewrite rule for `push-down-limit` for `Extension`

Signed-off-by: Austin Liu <[email protected]>

* Add rewrite rule for `push-down-limit` for `Extension` and tests

Signed-off-by: Austin Liu <[email protected]>

* Update corresponding interface

Signed-off-by: Austin Liu <[email protected]>

* Reorganize to match guard

Signed-off-by: Austin Liu <[email protected]>

* Clena up

Signed-off-by: Austin Liu <[email protected]>

Clean up

Signed-off-by: Austin Liu <[email protected]>

---------

Signed-off-by: Austin Liu <[email protected]>

* Fix AvroReader: Add union resolving for nested struct arrays (#12686)

* Add union resolving for nested struct arrays

* Add test

* Change test

* Reproduce index error

* fmt

---------

Co-authored-by: Andrew Lamb <[email protected]>

* Adds macros for creating `WindowUDF` and `WindowFunction` expression (#12693)

* Adds macro for udwf singleton

* Adds a doc comment parameter to macro

* Add doc comment for `create_udwf` macro

* Uses default constructor

* Update `Cargo.lock` in `datafusion-cli`

* Fixes: expand `$FN_NAME` in doc strings

* Adds example for macro usage

* Renames macro

* Improve doc comments

* Rename udwf macro

* Minor: doc copy edits

* Adds macro for creating fluent-style expression API

* Adds support for 1 or more parameters in expression function

* Rewrite doc comments

* Rename parameters

* Minor: formatting

* Adds doc comment for `create_udwf_expr` macro

* Improve example docs

* Hides extraneous code in doc comments

* Add a one-line readme

* Adds doc test assertions + minor formatting fixes

* Adds common macro for defining user-defined window functions

* Adds doc comment for `define_udwf_and_expr`

* Defines `RowNumber` using common macro

* Add usage example for common macro

* Adds usage for custom constructor

* Add examples for remaining patterns

* Improve doc comments for usage examples

* Rewrite inner line docs

* Rewrite `create_udwf_expr!` doc comments

* Minor doc improvements

* Fix doc test and usage example

* Add inline comments for macro patterns

* Minor: change doc comment in example

* Support unparsing plans with both Aggregation and Window functions (#12705)

* Support unparsing plans with both Aggregation and Window functions (#35)

* Fix unparsing for aggregation grouping sets

* Add test for grouping set unparsing

* Update datafusion/sql/src/unparser/utils.rs

Co-authored-by: Jax Liu <[email protected]>

* Update datafusion/sql/src/unparser/utils.rs

Co-authored-by: Jax Liu <[email protected]>

* Update

* More tests

---------

Co-authored-by: Jax Liu <[email protected]>

* Fix strpos invocation with dictionary and null (#12712)

In 1b3608d `strpos` signature was
modified to indicate it supports dictionary as input argument, but the
invoke method doesn't support them.

* docs: Update DataFusion introduction to clarify that DataFusion does provide an "out of the box" query engine (#12666)

* Update DataFusion introduction to show that DataFusion offers packaged versions for end users

* change order

* Update README.md

Co-authored-by: Andrew Lamb <[email protected]>

* refine wording and update user guide for consistency

* prettier

---------

Co-authored-by: Andrew Lamb <[email protected]>

* Framework for generating function docs from embedded code documentation (#12668)

* Initial work on #12432 to allow for generation of udf docs from embedded documentation in the code

* Add missing license header.

* Fixed examples.

* Fixing a really weird RustRover/wsl ... something. No clue what happened there.

* permission change

* Cargo fmt update.

* Refactored Documentation to allow it to be used in a const.

* Add documentation for syntax_example

* Refactoring Documentation based on PR feedback.

* Cargo fmt update.

* Doc update

* Fixed copy/paste error.

* Minor text updates.

---------

Co-authored-by: Andrew Lamb <[email protected]>

* Add IMDB(JOB) Benchmark [2/N] (imdb queries) (#12529)

* imdb dataset

* cargo fmt

* Add 113 queries for IMDB(JOB)

Signed-off-by: Austin Liu <[email protected]>

* Add `get_query_sql` from `query_id` string

Signed-off-by: Austin Liu <[email protected]>

* Fix CSV reader & Remove Parquet partition

Signed-off-by: Austin Liu <[email protected]>

* Add benchmark IMDB runner

Signed-off-by: Austin Liu <[email protected]>

* Add `run_imdb` script

Signed-off-by: Austin Liu <[email protected]>

* Add checker for imdb option

Signed-off-by: Austin Liu <[email protected]>

* Add SLT for IMDB

Signed-off-by: Austin Liu <[email protected]>

* Fix `get_query_sql()` for CI roundtrip test

Signed-off-by: Austin Liu <[email protected]>

Fix `get_query_sql()` for CI roundtrip test

Signed-off-by: Austin Liu <[email protected]>

Fix `get_query_sql()` for CI roundtrip test

Signed-off-by: Austin Liu <[email protected]>

* Clean up

Signed-off-by: Austin Liu <[email protected]>

* Add missing license

Signed-off-by: Austin Liu <[email protected]>

* Add IMDB(JOB) queries `2b` to `5c`

Signed-off-by: Austin Liu <[email protected]>

* Add `INCLUDE_IMDB` in CI verify-benchmark-results

Signed-off-by: Austin Liu <[email protected]>

* Prepare IMDB dataset

Signed-off-by: Austin Liu <[email protected]>

Prepare IMDB dataset

Signed-off-by: Austin Liu <[email protected]>

* use uint as id type

* format

* Seperate `tpch` and `imdb` benchmarking CI jobs

Signed-off-by: Austin Liu <[email protected]>

Fix path

Signed-off-by: Austin Liu <[email protected]>

Fix path

Signed-off-by: Austin Liu <[email protected]>

Remove `tpch` in `imdb` benchmark

Signed-off-by: Austin Liu <[email protected]>

* Remove IMDB(JOB) slt in CI

Signed-off-by: Austin Liu <[email protected]>

Remove IMDB(JOB) slt in CI

Signed-off-by: Austin Liu <[email protected]>

---------

Signed-off-by: Austin Liu <[email protected]>
Co-authored-by: DouPache <[email protected]>

* Minor: avoid clone while calculating union equivalence properties (#12722)

* Minor: avoid clone while calculating union equivalence properties

* Update datafusion/physical-expr/src/equivalence/properties.rs

* fmt

* Simplify streaming_merge function parameters (#12719)

* simplify streaming_merge function parameters

* revert test change

* change StreamingMergeConfig into builder pattern

* Fix links on docs index page (#12750)

* Provide field and schema metadata missing on cross joins, and union with null fields. (#12729)

* test: reproducer for missing schema metadata on cross join

* fix: pass thru schema metadata on cross join

* fix: preserve metadata when transforming to view types

* test: reproducer for missing field metadata in left hand NULL field of union

* fix: preserve field metadata from right side of union

* chore: safe indexing

* Minor: Update string tests for strpos (#12739)

* Apply `type_union_resolution` to array and values (#12753)

* cleanup make array coercion rule

Signed-off-by: jayzhan211 <[email protected]>

* change to type union resolution

Signed-off-by: jayzhan211 <[email protected]>

* change value too

Signed-off-by: jayzhan211 <[email protected]>

* fix tpyo

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>

* Add `DocumentationBuilder::with_standard_argument` to reduce copy/paste (#12747)

* Add `DocumentationBuilder::with_standard_expression` to reduce copy/paste

* fix doc

* fix standard argument

* Update docs

* Improve documentation to explain what is different

* fix `equal_to` in `PrimitiveGroupValueBuilder` (#12758)

* fix `equal_to` in `PrimitiveGroupValueBuilder`.

* fix typo.

* add uts.

* reduce calling of `is_null`.

* Minor: doc how field name is to be set (#12757)

* Fix `equal_to` in `ByteGroupValueBuilder` (#12770)

* Fix `equal_to` in `ByteGroupValueBuilder`

* refactor null_equal_to

* Update datafusion/physical-plan/src/aggregates/group_values/group_column.rs

* Allow simplification even when nullable (#12746)

The nullable requirement seem to have been added in #1401 but as far as
I can tell they are not needed for these 2 cases.

I think this can be shown using this truth table: (generated using
datafusion-cli without this patch)
```
> CREATE TABLE t (v BOOLEAN) as values (true), (false), (NULL);
> select t.v, t2.v, t.v AND (t.v OR t2.v), t.v OR (t.v AND t2.v) from t cross join t as t2;
+-------+-------+---------------------+---------------------+
| v     | v     | t.v AND t.v OR t2.v | t.v OR t.v AND t2.v |
+-------+-------+---------------------+---------------------+
| true  | true  | true                | true                |
| true  | false | true                | true                |
| true  |       | true                | true                |
| false | true  | false               | false               |
| false | false | false               | false               |
| false |       | false               | false               |
|       | true  |                     |                     |
|       | false |                     |                     |
|       |       |                     |                     |
+-------+-------+---------------------+---------------------+
```

And it seems Spark applies both of these and DuckDB applies only the
first one.

* Fix unnest conjunction with selecting wildcard expression (#12760)

* fix unnest statement with wildcard expression

* add commnets

* Improve `round` scalar function unparsing for Postgres (#12744)

* Postgres: enforce required `NUMERIC` type for `round` scalar function (#34)

Includes initial support for dialects to override scalar functions unparsing

* Document scalar_function_to_sql_overrides fn

* Fix stack overflow calculating projected orderings (#12759)

* Fix stack overflow calculating projected orderings

* fix docs

* Port / Add Documentation for `VarianceSample` and `VariancePopulation` (#12742)

* Upgrade arrow/parquet to `53.1.0` / fix clippy (#12724)

* Update to arrow/parquet 53.1.0

* Update some API

* update for changed file sizes

* Use non deprecated APIs

* Use ParquetMetadataReader from @etseidl

* remove upstreamed implementation

* Update CSV schema

* Use upstream is_null and is_not_null kernels

* feat: add support for Substrait ExtendedExpression (#12728)

* Add support for serializing and deserializing Substrait ExtendedExpr message

* Address clippy reviews

* Reuse existing rename method

* Transformed::new_transformed: Fix documentation formatting (#12787)

Co-authored-by: Andrew Lamb <[email protected]>

* fix: Correct results for grouping sets when columns contain nulls (#12571)

* Fix grouping sets behavior when data contains nulls

* PR suggestion comment

* Update new test case

* Add grouping_id to the logical plan

* Add doc comment next to INTERNAL_GROUPING_ID

* Fix unparsing of Aggregate with grouping sets

---------

Co-authored-by: Andrew Lamb <[email protected]>

* Migrate documentation for all string functions from scalar_functions.md to code  (#12775)

* Added documentation for string and unicode functions.

* Fixed issues with aliases.

* Cargo fmt.

* Minor doc fixes.

* Update docs for var_pop/samp

---------

Co-authored-by: Andrew Lamb <[email protected]>

* Account for constant equivalence properties in union, tests (#12562)

* Minor: clarify comment about empty dependencies (#12786)

* Introduce Signature::String and return error if  input of `strpos` is integer (#12751)

* fix sig

Signed-off-by: jayzhan211 <[email protected]>

* fix

Signed-off-by: jayzhan211 <[email protected]>

* fix error

Signed-off-by: jayzhan211 <[email protected]>

* fix all signature

Signed-off-by: jayzhan211 <[email protected]>

* fix all signature

Signed-off-by: jayzhan211 <[email protected]>

* change default type

Signed-off-by: jayzhan211 <[email protected]>

* clippy

Signed-off-by: jayzhan211 <[email protected]>

* fix docs

Signed-off-by: jayzhan211 <[email protected]>

* rm deadcode

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* cleanup

Signed-off-by: jayzhan211 <[email protected]>

* rm test

Signed-off-by: jayzhan211 <[email protected]>

---------

Signed-off-by: jayzhan211 <[email protected]>

* Minor: improve docs on MovingMin/MovingMax (#12790)

* Add slt tests (#12721)

---------

Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: Austin Liu <[email protected]>
Co-authored-by: OussamaSaoudi <[email protected]>
Co-authored-by: Jonah Gao <[email protected]>
Co-authored-by: Dmitrii Blaginin <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Tomoaki Kawada <[email protected]>
Co-authored-by: Piotr Findeisen <[email protected]>
Co-authored-by: Jay Zhan <[email protected]>
Co-authored-by: HuSen <[email protected]>
Co-authored-by: Emil Ejbyfeldt <[email protected]>
Co-authored-by: Simon Vandel Sillesen <[email protected]>
Co-authored-by: Jax Liu <[email protected]>
Co-authored-by: Austin Liu <[email protected]>
Co-authored-by: JonasDev1 <[email protected]>
Co-authored-by: jcsherin <[email protected]>
Co-authored-by: Sergei Grebnov <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: Bruce Ritchie <[email protected]>
Co-authored-by: DouPache <[email protected]>
Co-authored-by: mertak-synnada <[email protected]>
Co-authored-by: Bryce Mecum <[email protected]>
Co-authored-by: wiedld <[email protected]>
Co-authored-by: kamille <[email protected]>
Co-authored-by: Weston Pace <[email protected]>
Co-authored-by: Val Lorentz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants