Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
65d90c2
feat: Implement column sorting for interactive table widget
shuoweil Nov 11, 2025
75174e3
Merge branch 'main' into shuowei-anywidget-sort-by-col-name
shuoweil Nov 12, 2025
a31771a
update error handling and introduce three stages for sort
shuoweil Nov 12, 2025
021d35a
Merge branch 'main' into shuowei-anywidget-sort-by-col-name
shuoweil Nov 14, 2025
f5420d2
change to hoveble dot
shuoweil Nov 14, 2025
8fac06c
make arrow visiable after sorting
shuoweil Nov 14, 2025
0e40d69
merge main
shuoweil Nov 18, 2025
b4dcce7
Merge branch 'main' into shuowei-anywidget-sort-by-col-name
shuoweil Nov 20, 2025
0680139
remove unnecessary exception catch and use dataclass
shuoweil Nov 21, 2025
688ec48
add js unit test framework
shuoweil Nov 21, 2025
b0f051c
bug fix to display table in notebook
shuoweil Nov 21, 2025
eb2f648
fix: nox system-3.9 run
shuoweil Nov 21, 2025
e4e302c
Revert "fix: nox system-3.9 run"
shuoweil Nov 21, 2025
7f747b7
add reset
shuoweil Nov 21, 2025
07634b9
Deduplication
shuoweil Nov 21, 2025
9669a39
Merge branch 'main' into shuowei-anywidget-sort-by-col-name
shuoweil Nov 22, 2025
6abc1d6
Merge branch 'main' into shuowei-anywidget-sort-by-col-name
shuoweil Nov 24, 2025
580b492
Update bigframes/display/table_widget.js
shuoweil Nov 24, 2025
96e49eb
code refactor
shuoweil Nov 24, 2025
a708c57
revert a testcase change
shuoweil Nov 24, 2025
25a6ade
Merge branch 'main' into shuowei-anywidget-sort-by-col-name
shuoweil Nov 25, 2025
a04c92b
disable sorting for integer or multiindex
shuoweil Nov 25, 2025
7d48abd
Merge branch 'main' into shuowei-anywidget-sort-by-col-name
shuoweil Nov 25, 2025
f9bac38
fix mypy
shuoweil Nov 25, 2025
7b87d2a
fix mypy
shuoweil Nov 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions .github/workflows/js-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: js-tests
on:
pull_request:
branches:
- main
push:
branches:
- main
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Install modules
working-directory: ./tests/js
run: npm install
- name: Run tests
working-directory: ./tests/js
run: npm test
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ coverage.xml

# System test environment variables.
system_tests/local_test_setup
tests/js/node_modules/

# Make sure a generated file isn't accidentally committed.
pylintrc
Expand Down
64 changes: 59 additions & 5 deletions bigframes/display/anywidget.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

from __future__ import annotations

import dataclasses
from importlib import resources
import functools
import math
Expand All @@ -28,6 +29,7 @@
from bigframes.core import blocks
import bigframes.dataframe
import bigframes.display.html
import bigframes.dtypes as dtypes

# anywidget and traitlets are optional dependencies. We don't want the import of
# this module to fail if they aren't installed, though. Instead, we try to
Expand All @@ -48,6 +50,12 @@
WIDGET_BASE = object


@dataclasses.dataclass(frozen=True)
class _SortState:
column: str
ascending: bool


class TableWidget(WIDGET_BASE):
"""An interactive, paginated table widget for BigFrames DataFrames.

Expand All @@ -63,6 +71,9 @@ class TableWidget(WIDGET_BASE):
allow_none=True,
).tag(sync=True)
table_html = traitlets.Unicode().tag(sync=True)
sort_column = traitlets.Unicode("").tag(sync=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we considered having multiple columns as a possibility? I think a single column is a good starting point, but I think it's an alternative worth considering, especially when a particular column contains lots of duplicate values, like a "date" column.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that multi-column sorting is particularly valuable when a column has many duplicate values. I would like to get the single column sorting checked in first as a PR. Then check in a second PR for multi-column sorting. This current PR is already complex enough. I prefer two separate PRs as enhancements.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, separate PR makes sense to me, thanks.

sort_ascending = traitlets.Bool(True).tag(sync=True)
orderable_columns = traitlets.List(traitlets.Unicode(), []).tag(sync=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the general case, column names could be any "Hashable" value, including integers

import bigframes.pandas as bpd
import pandas as pd
bpd.options.bigquery.project = "swast-scratch"
df = bpd.DataFrame([[0, 1], [2, 3]], columns=pd.Index([1, 2]))
print(df)
print(df[1])
print(df[2])

Also, another very important case is the MultiIndex:

df = bpd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
                           'two'],
                   'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
                   'baz': [1, 2, 3, 4, 5, 6],
                   'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
pdf = df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])
print(pdf.columns)
print(pdf[("baz", "A")])

This isn't needed for SQL Cell, but we do need to make sure we function correctly in the general case. Could you test with these types of DataFrames?

If it's not feasible to support these in this PR, please avoid doing the sorting feature for those DataFrames for now and create a bug and a TODO to expand our support for any column label.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disable these two cases for now and filed b/463754889

_initial_load_complete = traitlets.Bool(False).tag(sync=True)
_batches: Optional[blocks.PandasBatches] = None
_error_message = traitlets.Unicode(allow_none=True, default_value=None).tag(
Expand All @@ -89,15 +100,25 @@ def __init__(self, dataframe: bigframes.dataframe.DataFrame):
self._all_data_loaded = False
self._batch_iter: Optional[Iterator[pd.DataFrame]] = None
self._cached_batches: List[pd.DataFrame] = []
self._last_sort_state: Optional[_SortState] = None

# respect display options for initial page size
initial_page_size = bigframes.options.display.max_rows

# set traitlets properties that trigger observers
# TODO(b/462525985): Investigate and improve TableWidget UX for DataFrames with a large number of columns.
self.page_size = initial_page_size
# TODO(b/463754889): Support non-string column labels for sorting.
if all(isinstance(col, str) for col in dataframe.columns):
self.orderable_columns = [
str(col_name)
for col_name, dtype in dataframe.dtypes.items()
if dtypes.is_orderable(dtype)
]
else:
self.orderable_columns = []

# len(dataframe) is expensive, since it will trigger a
# SELECT COUNT(*) query. It is a must have however.
# obtain the row counts
# TODO(b/428238610): Start iterating over the result of `to_pandas_batches()`
# before we get here so that the count might already be cached.
self._reset_batches_for_new_page_size()
Expand All @@ -121,6 +142,11 @@ def __init__(self, dataframe: bigframes.dataframe.DataFrame):
# Also used as a guard to prevent observers from firing during initialization.
self._initial_load_complete = True

@traitlets.observe("_initial_load_complete")
def _on_initial_load_complete(self, change: Dict[str, Any]):
if change["new"]:
self._set_table_html()

@functools.cached_property
def _esm(self):
"""Load JavaScript code from external file."""
Expand Down Expand Up @@ -221,13 +247,17 @@ def _cached_data(self) -> pd.DataFrame:
return pd.DataFrame(columns=self._dataframe.columns)
return pd.concat(self._cached_batches, ignore_index=True)

def _reset_batch_cache(self) -> None:
"""Resets batch caching attributes."""
self._cached_batches = []
self._batch_iter = None
self._all_data_loaded = False

def _reset_batches_for_new_page_size(self) -> None:
"""Reset the batch iterator when page size changes."""
self._batches = self._dataframe._to_pandas_batches(page_size=self.page_size)

self._cached_batches = []
self._batch_iter = None
self._all_data_loaded = False
self._reset_batch_cache()

def _set_table_html(self) -> None:
"""Sets the current html data based on the current page and page size."""
Expand All @@ -237,6 +267,21 @@ def _set_table_html(self) -> None:
)
return

# Apply sorting if a column is selected
df_to_display = self._dataframe
if self.sort_column:
# TODO(b/463715504): Support sorting by index columns.
df_to_display = df_to_display.sort_values(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we file a bug and a TODO to support sorting by the index column(s) as well? For those, we'd use sort_index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filed b/463715504

by=self.sort_column, ascending=self.sort_ascending
)

# Reset batches when sorting changes
if self._last_sort_state != _SortState(self.sort_column, self.sort_ascending):
self._batches = df_to_display._to_pandas_batches(page_size=self.page_size)
self._reset_batch_cache()
self._last_sort_state = _SortState(self.sort_column, self.sort_ascending)
self.page = 0 # Reset to first page

start = self.page * self.page_size
end = start + self.page_size

Expand Down Expand Up @@ -272,8 +317,14 @@ def _set_table_html(self) -> None:
self.table_html = bigframes.display.html.render_html(
dataframe=page_data,
table_id=f"table-{self._table_id}",
orderable_columns=self.orderable_columns,
)

@traitlets.observe("sort_column", "sort_ascending")
def _sort_changed(self, _change: Dict[str, Any]):
"""Handler for when sorting parameters change from the frontend."""
self._set_table_html()

@traitlets.observe("page")
def _page_changed(self, _change: Dict[str, Any]) -> None:
"""Handler for when the page number is changed from the frontend."""
Expand All @@ -288,6 +339,9 @@ def _page_size_changed(self, _change: Dict[str, Any]) -> None:
return
# Reset the page to 0 when page size changes to avoid invalid page states
self.page = 0
# Reset the sort state to default (no sort)
self.sort_column = ""
self.sort_ascending = True

# Reset batches to use new page size for future data fetching
self._reset_batches_for_new_page_size()
Expand Down
18 changes: 16 additions & 2 deletions bigframes/display/html.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,15 @@
from __future__ import annotations

import html
from typing import Any

import pandas as pd
import pandas.api.types

from bigframes._config import options


def _is_dtype_numeric(dtype) -> bool:
def _is_dtype_numeric(dtype: Any) -> bool:
"""Check if a dtype is numeric for alignment purposes."""
return pandas.api.types.is_numeric_dtype(dtype)

Expand All @@ -33,18 +34,31 @@ def render_html(
*,
dataframe: pd.DataFrame,
table_id: str,
orderable_columns: list[str] | None = None,
) -> str:
"""Render a pandas DataFrame to HTML with specific styling."""
classes = "dataframe table table-striped table-hover"
table_html = [f'<table border="1" class="{classes}" id="{table_id}">']
precision = options.display.precision
orderable_columns = orderable_columns or []

# Render table head
table_html.append(" <thead>")
table_html.append(' <tr style="text-align: left;">')
for col in dataframe.columns:
th_classes = []
if col in orderable_columns:
th_classes.append("sortable")
class_str = f'class="{" ".join(th_classes)}"' if th_classes else ""
header_div = (
'<div style="resize: horizontal; overflow: auto; '
"box-sizing: border-box; width: 100%; height: 100%; "
'padding: 0.5em;">'
f"{html.escape(str(col))}"
"</div>"
)
table_html.append(
f' <th style="text-align: left;"><div style="resize: horizontal; overflow: auto; box-sizing: border-box; width: 100%; height: 100%; padding: 0.5em;">{html.escape(str(col))}</div></th>'
f' <th style="text-align: left;" {class_str}>{header_div}</th>'
)
table_html.append(" </tr>")
table_html.append(" </thead>")
Expand Down
38 changes: 36 additions & 2 deletions bigframes/display/table_widget.css
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,10 @@
align-items: center;
display: flex;
font-size: 0.8rem;
padding-top: 8px;
justify-content: space-between;
padding: 8px;
font-family:
-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
}

.bigframes-widget .footer > * {
Expand All @@ -44,6 +47,14 @@
padding: 4px;
}

.bigframes-widget .page-indicator {
margin: 0 8px;
}

.bigframes-widget .row-count {
margin: 0 8px;
}

.bigframes-widget .page-size {
align-items: center;
display: flex;
Expand All @@ -52,19 +63,31 @@
justify-content: end;
}

.bigframes-widget .page-size label {
margin-right: 8px;
}

.bigframes-widget table {
border-collapse: collapse;
text-align: left;
}

.bigframes-widget th {
background-color: var(--colab-primary-surface-color, var(--jp-layout-color0));
/* Uncomment once we support sorting: cursor: pointer; */
position: sticky;
top: 0;
z-index: 1;
}

.bigframes-widget th .sort-indicator {
padding-left: 4px;
visibility: hidden;
}

.bigframes-widget th:hover .sort-indicator {
visibility: visible;
}

.bigframes-widget button {
cursor: pointer;
display: inline-block;
Expand All @@ -78,3 +101,14 @@
opacity: 0.65;
pointer-events: none;
}

.bigframes-widget .error-message {
font-family:
-apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, sans-serif;
font-size: 14px;
padding: 8px;
margin-bottom: 8px;
border: 1px solid red;
border-radius: 4px;
background-color: #ffebee;
}
Loading