Skip to content

Conversation

@zhangfengcdt
Copy link
Member

@zhangfengcdt zhangfengcdt commented Dec 1, 2025

Did you read the Contributor Guide?

Is this PR related to a ticket?

  • Yes, and the PR name follows the format [GH-XXX] my subject. Closes #<issue_number>

What changes were proposed in this PR?

  • Add ST_Collect_Agg as a new spatial aggregate function

  • Collects all geometries in a column into a multi-geometry (MultiPoint, MultiLineString, MultiPolygon, or
    GeometryCollection)

  • Unlike ST_Union_Agg, this function does not dissolve boundaries - it simply collects geometries

  • Add ST_Collect_Agg class in AggregateFunctions.scala

  • Add DataFrame API support in st_aggregates.scala

  • Register function in Spark SQL catalog

  • Add Python API in st_aggregates.py

  • Add Scala and Python tests

  • Add documentation

How was this patch tested?

  • Scala unit tests for various geometry types (points, polygons, mixed)
  • Test GROUP BY functionality
  • Test null handling (nulls are skipped)
  • Test duplicate preservation (unlike ST_Union_Agg)
  • DataFrame API tests (Scala and Python)

Did this PR include necessary documentation updates?

  • Yes, I have updated the documentation.

@github-actions github-actions bot added the root label Dec 2, 2025
@petern48 petern48 linked an issue Dec 2, 2025 that may be closed by this pull request
@petern48
Copy link
Collaborator

petern48 commented Dec 2, 2025

Considering this issue (#2547), shall we maybe skip ST_Collect_Aggr and name it ST_Collect_Agg? Alternatively, we could support both if there are worries about users expecting Aggr to work. Personally think it's fine to skip _Aggr since this is a new function. @jiayuasu WDYT?

@jiayuasu
Copy link
Member

jiayuasu commented Dec 2, 2025

Agree. Let's all use _agg for all new functions now. But we need to have a separate PR that allows old aggr functions to be registered under both names

@zhangfengcdt zhangfengcdt marked this pull request as ready for review December 2, 2025 19:02
@zhangfengcdt zhangfengcdt changed the title [GH-2545] Add ST_Collect_Aggr aggregate function [GH-2545] Add ST_Collect_Agg aggregate function Dec 2, 2025
@github-actions github-actions bot removed the root label Dec 4, 2025
@jiayuasu jiayuasu added this to the sedona-1.8.1 milestone Dec 4, 2025
@jiayuasu jiayuasu requested a review from Copilot December 5, 2025 07:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new spatial aggregate function ST_Collect_Agg that collects geometries into multi-geometries without dissolving boundaries, differentiating it from the existing ST_Union_Agg function.

Key Changes:

  • Implements ST_Collect_Agg aggregate function that preserves individual geometries and duplicates
  • Adds DataFrame API support for both Scala and Python
  • Includes comprehensive test coverage for various geometry types and edge cases

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/AggregateFunctions.scala Core implementation of ST_Collect_Agg aggregator class
spark/common/src/main/scala/org/apache/spark/sql/sedona_sql/expressions/st_aggregates.scala DataFrame API methods for ST_Collect_Agg
spark/common/src/main/scala/org/apache/sedona/sql/UDF/Catalog.scala Registration of ST_Collect_Agg in function catalog
spark/common/src/test/scala/org/apache/sedona/sql/aggregateFunctionTestScala.scala SQL-based tests for various use cases
spark/common/src/test/scala/org/apache/sedona/sql/dataFrameAPITestScala.scala DataFrame API test for basic functionality
python/sedona/spark/sql/st_aggregates.py Python API implementation with documentation
python/tests/sql/test_aggregate_functions.py Python tests for aggregate function behavior
python/tests/sql/test_dataframe_api.py Python DataFrame API test
docs/api/sql/AggregateFunction.md Documentation with usage examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jiayuasu jiayuasu merged commit ec7d9a6 into apache:master Dec 5, 2025
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement ST_Collect_Aggr

3 participants