Skip to content

Conversation

@arwas11
Copy link
Contributor

@arwas11 arwas11 commented Feb 5, 2025

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)
    • from_xy: screen/3pW4NWMsvfvVeYd, screen/3SimxdQU93zjinv, screen/rfsK45NBkxkUdz7

Fixes #393394365 🦕

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Feb 5, 2025
@arwas11 arwas11 requested a review from tswast February 5, 2025 19:51
@arwas11 arwas11 changed the title feat: add GeoSeries.from_xy feat: add GeoSeries.from_xy() Feb 5, 2025
@tswast tswast added the owlbot:run Add this label to trigger the Owlbot post processor. label Feb 5, 2025
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Feb 5, 2025
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@tswast tswast marked this pull request as ready for review February 5, 2025 19:56
@tswast tswast requested review from a team as code owners February 5, 2025 19:56
@tswast tswast requested a review from Genesis929 February 5, 2025 19:56
@arwas11 arwas11 enabled auto-merge (squash) February 5, 2025 20:06
@tswast
Copy link
Collaborator

tswast commented Feb 5, 2025

Doctest failures:

FAILED third_party/bigframes_vendored/pandas/core/frame.py::frame.DataFrame.map
FAILED third_party/bigframes_vendored/pandas/core/frame.py::frame.DataFrame.apply
FAILED third_party/bigframes_vendored/pandas/core/series.py::series.Series.apply
FAILED third_party/bigframes_vendored/pandas/core/series.py::series.Series.map
FAILED third_party/bigframes_vendored/pandas/core/series.py::series.Series.mask
FAILED bigframes/dataframe.py::bigframes.dataframe.DataFrame.map
FAILED bigframes/dataframe.py::bigframes.dataframe.DataFrame.applymap
FAILED bigframes/pandas/io/api.py::bigframes.pandas.io.api.read_gbq_function
FAILED bigframes/session/__init__.py::bigframes.session.Session.read_gbq_function

These don't appear to be related to this change.

@tswast tswast disabled auto-merge February 5, 2025 20:31
@tswast
Copy link
Collaborator

tswast commented Feb 5, 2025

Notebook test failures:

Failed: generative_ai/bq_dataframes_llm_kmeans.ipynb
Failed: geo/geoseries.ipynb
Failed: getting_started/getting_started_bq_dataframes.ipynb
Failed: location/regionalized.ipynb_us-central1
Failed: location/regionalized.ipynb_us
Failed: remote_functions/remote_function.ipynb
Failed: remote_functions/remote_function_usecases.ipynb

The geoseries.ipynb one does look related. Let's fix before merging.

error
__ /tmpfs/src/github/python-bigquery-dataframes/notebooks/geo/geoseries.ipynb __
---------------------------------------------------------------------------
bigframes.geopandas.GeoSeries.from_xy(geo_points.x, geo_points.y)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
File /[tmpfs/src/github/python-bigquery-dataframes/.nox/notebook-3-10/lib/python3.10/site-packages/IPython/core/formatters.py:770](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/.nox/notebook-3-10/lib/python3.10/site-packages/IPython/core/formatters.py?l=770), in PlainTextFormatter.__call__(self, obj)
    763 stream = StringIO()
    764 printer = pretty.RepresentationPrinter(stream, self.verbose,
    765     self.max_width, self.newline,
    766     max_seq_length=self.max_seq_length,
    767     singleton_pprinters=self.singleton_printers,
    768     type_pprinters=self.type_printers,
    769     deferred_pprinters=self.deferred_printers)
--> 770 printer.pretty(obj)
    771 printer.flush()
    772 return stream.getvalue()

File /[tmpfs/src/github/python-bigquery-dataframes/.nox/notebook-3-10/lib/python3.10/site-packages/IPython/lib/pretty.py:419](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/.nox/notebook-3-10/lib/python3.10/site-packages/IPython/lib/pretty.py?l=419), in RepresentationPrinter.pretty(self, obj)
    408                         return meth(obj, self, cycle)
    409                 if (
    410                     cls is not object
    411                     # check if cls defines __repr__
   (...)
    417                     and callable(_safe_getattr(cls, "__repr__", None))
    418                 ):
--> 419                     return _repr_pprint(obj, self, cycle)
    421     return _default_pprint(obj, self, cycle)
    422 finally:

File /[tmpfs/src/github/python-bigquery-dataframes/.nox/notebook-3-10/lib/python3.10/site-packages/IPython/lib/pretty.py:794](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/.nox/notebook-3-10/lib/python3.10/site-packages/IPython/lib/pretty.py?l=794), in _repr_pprint(obj, p, cycle)
    792 """A pprint that just redirects to the normal repr function."""
    793 # Find newlines and replace them with p.break_()
--> 794 output = repr(obj)
    795 lines = output.splitlines()
    796 with p.group():

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/core/log_adapter.py:147](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/log_adapter.py?l=147), in method_logger.<locals>.wrapper(self, *args, **kwargs)
    144 _call_stack.append(full_method_name)
    146 try:
--> 147     return method(self, *args, **kwargs)
    148 except (NotImplementedError, TypeError) as e:
    149     # Log method parameters that are implemented in pandas but either missing (TypeError)
    150     # or not fully supported (NotImplementedError) in BigFrames.
    151     # Logging is currently supported only when we can access the bqclient through
    152     # self._block.expr.session.bqclient. Also, to avoid generating multiple queries
    153     # because of internal calls, we log only when the method is directly invoked.
    154     if hasattr(self, "_block") and len(_call_stack) == 1:

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/series.py:346](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/series.py?l=346), in Series.__repr__(self)
    343     return formatter.repr_query_job(self._compute_dry_run())
    345 self._cached()
--> 346 pandas_df, _, query_job = self._block.retrieve_repr_request_results(max_results)
    347 self._set_internal_query_job(query_job)
    349 pd_series = pandas_df.iloc[:, 0]

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/core/blocks.py:1453](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/blocks.py?l=1453), in Block.retrieve_repr_request_results(self, max_results)
   1445 """
   1446 Retrieves a pandas dataframe containing only max_results many rows for use
   1447 with printing methods.
   1448 
   1449 Returns a tuple of the dataframe and the overall number of rows of the query.
   1450 """
   1452 # head caches full underlying expression, so row_count will be free after
-> 1453 head_result = self.session._executor.head(self.expr, max_results)
   1454 count = self.session._executor.get_row_count(self.expr)
   1456 arrow = head_result.to_arrow_table()

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/session/executor.py:401](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/session/executor.py?l=401), in BigQueryCachingExecutor.head(self, array_value, n_rows)
    399 maybe_row_count = self._local_get_row_count(array_value)
    400 if (maybe_row_count is not None) and (maybe_row_count <= n_rows):
--> 401     return self.execute(array_value, ordered=True)
    403 if not self.strictly_ordered and not array_value.node.explicitly_ordered:
    404     # No user-provided ordering, so just get any N rows, its faster!
    405     return self.peek(array_value, n_rows)

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/session/executor.py:286](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/session/executor.py?l=286), in BigQueryCachingExecutor.execute(self, array_value, ordered, col_id_overrides, use_explicit_destination, get_size_bytes, page_size, max_results)
    283 # Runs strict validations to ensure internal type predictions and ibis are completely in sync
    284 # Do not execute these validations outside of testing suite.
    285 if "PYTEST_CURRENT_TEST" in os.environ and len(col_id_overrides) == 0:
--> 286     self._validate_result_schema(array_value, iterator.schema)
    288 return ExecuteResult(
    289     arrow_batches=iterator_supplier,
    290     schema=adjusted_schema,
   (...)
    293     total_rows=iterator.total_rows,
    294 )

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/session/executor.py:648](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/session/executor.py?l=648), in BigQueryCachingExecutor._validate_result_schema(self, array_value, bq_schema)
    642 def _validate_result_schema(
    643     self,
    644     array_value: bigframes.core.ArrayValue,
    645     bq_schema: list[bigquery.SchemaField],
    646 ):
    647     actual_schema = tuple(bq_schema)
--> 648     ibis_schema = bigframes.core.compile.test_only_ibis_inferred_schema(
    649         self.replace_cached_subtrees(array_value.node)
    650     )
    651     internal_schema = array_value.schema
    652     if not bigframes.features.PANDAS_VERSIONS.is_arrow_list_dtype_usable:

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/api.py:83](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/api.py?l=83), in test_only_ibis_inferred_schema(node)
     81 node = _STRICT_COMPILER._preprocess(node)
     82 compiled = _STRICT_COMPILER.compile_node(node)
---> 83 items = tuple(
     84     bigframes.core.schema.SchemaItem(name, compiled.get_column_type(ibis_id))
     85     for name, ibis_id in zip(node.schema.names, compiled.column_ids)
     86 )
     87 return bigframes.core.schema.ArraySchema(items)

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/api.py:84](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/api.py?l=84), in <genexpr>(.0)
     81 node = _STRICT_COMPILER._preprocess(node)
     82 compiled = _STRICT_COMPILER.compile_node(node)
     83 items = tuple(
---> 84     bigframes.core.schema.SchemaItem(name, compiled.get_column_type(ibis_id))
     85     for name, ibis_id in zip(node.schema.names, compiled.column_ids)
     86 )
     87 return bigframes.core.schema.ArraySchema(items)

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/compiled.py:154](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/compiled.py?l=154), in UnorderedIR.get_column_type(self, key)
    147 def get_column_type(self, key: str) -> bigframes.dtypes.Dtype:
    148     ibis_type = typing.cast(
    149         bigframes.core.compile.ibis_types.IbisDtype,
    150         self._get_ibis_column(key).type(),
    151     )
    152     return typing.cast(
    153         bigframes.dtypes.Dtype,
--> 154         bigframes.core.compile.ibis_types.ibis_dtype_to_bigframes_dtype(ibis_type),
    155     )

File /[tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/ibis_types.py:307](https://cs.corp.google.com/piper///depot/google3/tmpfs/src/github/python-bigquery-dataframes/bigframes/core/compile/ibis_types.py?l=307), in ibis_dtype_to_bigframes_dtype(ibis_dtype)
    305     return IBIS_TO_BIGFRAMES[ibis_dtypes.string]
    306 else:
--> 307     raise ValueError(
    308         f"Unexpected Ibis data type {ibis_dtype}. {constants.FEEDBACK_LINK}"
    309     )

ValueError: Unexpected Ibis data type point:geometry. Share your usecase with the BigQuery DataFrames team at the [https://bit.ly/bigframes-feedback](https://www.google.com/url?q=https://bit.ly/bigframes-feedback&sa=D) survey.You are currently running BigFrames version 1.35.0

Learn more about nbmake at [https://github.com/treebeardtech/nbmake](https://www.google.com/url?q=https://github.com/treebeardtech/nbmake&sa=D)

@tswast tswast added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Feb 5, 2025
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the notebook failure, please add some logic to ibis_dtype_to_bigframes_dtype looking for all ibis geography-y types and returning the gpd.array.GeometryDtype().

Something like

if isinstance(ibis_dtype, ibis_dtypes.GeoSpatial):
    return gpd.array.GeometryDtype()

around here:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we can look for ibis_dtypes.GeoSpatial is that all the geography-y types have it as a superclass. See:

Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking as request changes to cover the ibis_dtype_to_bigframes_dtype change for the notebook fix before merging.

arwas11 and others added 3 commits February 6, 2025 11:38
* chore: support timestamp subtractions

* Fix format

* use tree rewrites to dispatch timestamp_diff operator

* add TODO for more node updates

* polish the code and fix typos

* fix comment

* add rewrites to compile_raw and compile_peek_sql
* chore: add a tool to upload tpcds data to bigquery.

* update error type

* update docstring
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. and removed size: m Pull request size is medium. labels Feb 6, 2025
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: xl Pull request size is extra large. labels Feb 6, 2025
@arwas11 arwas11 requested a review from tswast February 6, 2025 18:46
@tswast tswast removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Feb 6, 2025
@tswast tswast enabled auto-merge (squash) February 6, 2025 19:17
@tswast tswast merged commit 3c3e14c into main Feb 6, 2025
22 of 24 checks passed
@tswast tswast deleted the b308738592-geopandas-point-from-xy branch February 6, 2025 19:48
shuoweil pushed a commit that referenced this pull request Feb 6, 2025
* feat: add GeoSeries.from_xy

* add from_xy test and update ibis types

* update geoseries notebook with from_xy

* Update docstring example

* fix doctstring lint error

* return GeometryDtype() for all ibis geo types

* chore: support timestamp subtractions (#1346)

* chore: support timestamp subtractions

* Fix format

* use tree rewrites to dispatch timestamp_diff operator

* add TODO for more node updates

* polish the code and fix typos

* fix comment

* add rewrites to compile_raw and compile_peek_sql

* chore: add a tool to upload tpcds data to bigquery. (#1367)

* chore: add a tool to upload tpcds data to bigquery.

* update error type

* update docstring

---------

Co-authored-by: Shenyang Cai <[email protected]>
Co-authored-by: Huan Chen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: m Pull request size is medium.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants