feat: df.join lsuffix and rsuffix support #1857

Genesis929 · 2025-06-26T00:22:33Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

bigframes/dataframe.py

tswast · 2025-07-01T20:24:56Z

tests/system/small/test_dataframe.py

+        ["string_col", "int64_col", "int64_too"]
+    ].rename(columns={"int64_too": "int64_col"})
+    pd_result = pd_df_a.join(pd_df_b, how=how, lsuffix="_l", rsuffix="_r")
+    print(pd_result)


Remove leftover print() statements.

PS. Adding --pdb to your pytest command line arguments makes dropping into a debugger to inspect variables really easy. https://docs.pytest.org/en/stable/how-to/failures.html#dropping-to-pdb-on-failures

tests/system/small/test_dataframe.py

tests/unit/test_dataframe_polars.py

tswast · 2025-07-01T20:27:44Z

tests/unit/test_dataframe_polars.py

+    if how == "cross":
+        return


I think it'd be worth added a test that ValueError is not raise for this condition with a cross join.

Cross join actually raise another error, match added.

tswast · 2025-07-15T18:39:11Z

bigframes/dataframe.py

+            f"bigframes_left_col_name_{i}" if col_name != on else on_col_name
+            for i, col_name in enumerate(left_col_original_names)
+        ]
+        left.columns = pandas.Index(left_col_temp_names)


This seems dangerous. We haven't made a copy of self, so I'm uncomfortable with mutating it. If we must do this, then please either:

make a copy of self first

or put a finally block that resets the names back to the original in case anything when wrong.

I prefer (1) since it's less likely to have problems in we're in a multi-threaded environment.

Updated left = self.copy()

tswast · 2025-07-15T18:39:35Z

bigframes/dataframe.py

+            f"bigframes_left_idx_name_{i}" for i in range(len(left_idx_original_names))
+        ]
+        if left._has_index:
+            left.index.names = left_idx_names_in_cols


Same here. Mutating the index is dangerous. Can we avoid this?

We need to avoid duplicates in names when join, or the reordering columns won't work, so for current join logic, we can't avoid this.

tswast · 2025-07-15T18:39:47Z

bigframes/dataframe.py

+            f"bigframes_right_col_name_{i}"
+            for i in range(len(right_col_original_names))
+        ]
+        right.columns = pandas.Index(right_col_temp_names)


We need to avoid duplicates in names when join, or the reordering columns won't work, so for current join logic, we can't avoid this.

tswast · 2025-07-15T18:41:44Z

bigframes/dataframe.py

+        right_columns,
+        lsuffix: str = "",
+        rsuffix: str = "",
+        extra_col: typing.Optional[str] = None,


Please add a docstring explaining this extra_col parameter and when it is intended to be used.

tswast · 2025-07-15T18:42:51Z

bigframes/dataframe.py

+                final_col_names.append(f"{col_name}{rsuffix}")
+            else:
+                final_col_names.append(col_name)
+        self.columns = pandas.Index(final_col_names)


I think we should only be modifying self if we're doing an inplace operation, right? Why is self getting changed? Can we avoid this?

In this function the self is actually combined_df, so it should be safe. Changed to self.copy for additional safety.

tswast · 2025-07-15T18:44:09Z

tests/system/small/test_dataframe.py

+    bf_df_b = scalars_df_index.dropna()[
+        ["string_col", "int64_col", "int64_too"]
+    ].rename(columns={"int64_too": "int64_col"})
+    bf_result = bf_df_a.join(bf_df_b, how=how, lsuffix="_l", rsuffix="_r").to_pandas()


Can we add some checks that bf_df_a's column names and index names didn't get modified?

feat: df.join lsuffix and rsuffix support

8e85a0b

product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jun 26, 2025

Genesis929 and others added 6 commits June 26, 2025 00:50

raise error when on is duplicated.

515c985

rename

481a6bb

Merge branch 'main' into join_suffix

e66a0a1

error update.

798d3d5

Merge branch 'main' into join_suffix

14a1c54

test fix.

8c6630b

Genesis929 added the owlbot:run Add this label to trigger the Owlbot post processor. label Jun 26, 2025

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jun 26, 2025

add doc and test fixes

69fa715

product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Jun 26, 2025

Genesis929 and others added 2 commits June 26, 2025 11:41

Merge branch 'main' into join_suffix

9748b35

skip pandas 1.x test

53ef0cc

Genesis929 added the owlbot:run Add this label to trigger the Owlbot post processor. label Jun 26, 2025

gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Jun 26, 2025

Genesis929 marked this pull request as ready for review June 26, 2025 19:48

Genesis929 requested review from a team as code owners June 26, 2025 19:48

Genesis929 requested a review from tswast June 26, 2025 19:48

blunderbuss-gcf bot assigned TrevorBergeron Jun 26, 2025

Genesis929 requested a review from TrevorBergeron June 26, 2025 19:48

Merge branch 'main' into join_suffix

8b09d10

Genesis929 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 26, 2025

bigframes-bot removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 26, 2025

tswast reviewed Jul 1, 2025

View reviewed changes

Genesis929 and others added 2 commits July 7, 2025 18:03

test fixes

4e80220

Merge branch 'main' into join_suffix

052e090

Genesis929 and others added 4 commits July 7, 2025 18:37

create join on key helper function

d661ea6

test fix

1ba81a4

test fix

014bb73

Merge branch 'main' into join_suffix

cd4d962

Genesis929 requested a review from tswast July 7, 2025 20:37

Genesis929 added 2 commits July 10, 2025 13:16

Merge branch 'main' into join_suffix

12464f2

Merge branch 'main' into join_suffix

6892d84

tswast reviewed Jul 15, 2025

View reviewed changes

update join to avoid inplace changes.

2ced5af

Genesis929 requested a review from tswast July 28, 2025 20:09

Merge branch 'main' into join_suffix

8b5c94a

feat: df.join lsuffix and rsuffix support #1857

Are you sure you want to change the base?

feat: df.join lsuffix and rsuffix support #1857

Conversation

Genesis929 commented Jun 26, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!