-
Notifications
You must be signed in to change notification settings - Fork 62
feat: Implement single-column sorting for interactive table widget #2255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
65d90c2
75174e3
a31771a
021d35a
f5420d2
8fac06c
0e40d69
b4dcce7
0680139
688ec48
b0f051c
eb2f648
e4e302c
7f747b7
07634b9
9669a39
6abc1d6
580b492
96e49eb
a708c57
25a6ade
a04c92b
7d48abd
f9bac38
7b87d2a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,20 @@ | ||
| name: js-tests | ||
| on: | ||
| pull_request: | ||
| branches: | ||
| - main | ||
| push: | ||
| branches: | ||
| - main | ||
| jobs: | ||
| build: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - name: Checkout | ||
| uses: actions/checkout@v4 | ||
| - name: Install modules | ||
| working-directory: ./tests/js | ||
| run: npm install | ||
| - name: Run tests | ||
| working-directory: ./tests/js | ||
| run: npm test |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,6 +16,7 @@ | |
|
|
||
| from __future__ import annotations | ||
|
|
||
| import dataclasses | ||
| from importlib import resources | ||
| import functools | ||
| import math | ||
|
|
@@ -28,6 +29,7 @@ | |
| from bigframes.core import blocks | ||
| import bigframes.dataframe | ||
| import bigframes.display.html | ||
| import bigframes.dtypes as dtypes | ||
|
|
||
| # anywidget and traitlets are optional dependencies. We don't want the import of | ||
| # this module to fail if they aren't installed, though. Instead, we try to | ||
|
|
@@ -48,6 +50,12 @@ | |
| WIDGET_BASE = object | ||
|
|
||
|
|
||
| @dataclasses.dataclass(frozen=True) | ||
| class _SortState: | ||
| column: str | ||
| ascending: bool | ||
|
|
||
|
|
||
| class TableWidget(WIDGET_BASE): | ||
| """An interactive, paginated table widget for BigFrames DataFrames. | ||
|
|
||
|
|
@@ -63,6 +71,9 @@ class TableWidget(WIDGET_BASE): | |
| allow_none=True, | ||
| ).tag(sync=True) | ||
| table_html = traitlets.Unicode().tag(sync=True) | ||
| sort_column = traitlets.Unicode("").tag(sync=True) | ||
| sort_ascending = traitlets.Bool(True).tag(sync=True) | ||
| orderable_columns = traitlets.List(traitlets.Unicode(), []).tag(sync=True) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the general case, column names could be any "Hashable" value, including integers import bigframes.pandas as bpd
import pandas as pd
bpd.options.bigquery.project = "swast-scratch"
df = bpd.DataFrame([[0, 1], [2, 3]], columns=pd.Index([1, 2]))
print(df)
print(df[1])
print(df[2])Also, another very important case is the MultiIndex: df = bpd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
'two'],
'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
'baz': [1, 2, 3, 4, 5, 6],
'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
pdf = df.pivot(index='foo', columns='bar', values=['baz', 'zoo'])
print(pdf.columns)
print(pdf[("baz", "A")])This isn't needed for SQL Cell, but we do need to make sure we function correctly in the general case. Could you test with these types of DataFrames? If it's not feasible to support these in this PR, please avoid doing the sorting feature for those DataFrames for now and create a bug and a TODO to expand our support for any column label.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I disable these two cases for now and filed b/463754889 |
||
| _initial_load_complete = traitlets.Bool(False).tag(sync=True) | ||
| _batches: Optional[blocks.PandasBatches] = None | ||
| _error_message = traitlets.Unicode(allow_none=True, default_value=None).tag( | ||
|
|
@@ -89,15 +100,25 @@ def __init__(self, dataframe: bigframes.dataframe.DataFrame): | |
| self._all_data_loaded = False | ||
| self._batch_iter: Optional[Iterator[pd.DataFrame]] = None | ||
| self._cached_batches: List[pd.DataFrame] = [] | ||
| self._last_sort_state: Optional[_SortState] = None | ||
|
|
||
| # respect display options for initial page size | ||
| initial_page_size = bigframes.options.display.max_rows | ||
|
|
||
| # set traitlets properties that trigger observers | ||
| # TODO(b/462525985): Investigate and improve TableWidget UX for DataFrames with a large number of columns. | ||
| self.page_size = initial_page_size | ||
| # TODO(b/463754889): Support non-string column labels for sorting. | ||
| if all(isinstance(col, str) for col in dataframe.columns): | ||
| self.orderable_columns = [ | ||
| str(col_name) | ||
| for col_name, dtype in dataframe.dtypes.items() | ||
| if dtypes.is_orderable(dtype) | ||
| ] | ||
| else: | ||
| self.orderable_columns = [] | ||
|
|
||
| # len(dataframe) is expensive, since it will trigger a | ||
| # SELECT COUNT(*) query. It is a must have however. | ||
| # obtain the row counts | ||
| # TODO(b/428238610): Start iterating over the result of `to_pandas_batches()` | ||
| # before we get here so that the count might already be cached. | ||
| self._reset_batches_for_new_page_size() | ||
|
|
@@ -121,6 +142,11 @@ def __init__(self, dataframe: bigframes.dataframe.DataFrame): | |
| # Also used as a guard to prevent observers from firing during initialization. | ||
| self._initial_load_complete = True | ||
|
|
||
| @traitlets.observe("_initial_load_complete") | ||
| def _on_initial_load_complete(self, change: Dict[str, Any]): | ||
| if change["new"]: | ||
| self._set_table_html() | ||
|
|
||
| @functools.cached_property | ||
| def _esm(self): | ||
| """Load JavaScript code from external file.""" | ||
|
|
@@ -221,13 +247,17 @@ def _cached_data(self) -> pd.DataFrame: | |
| return pd.DataFrame(columns=self._dataframe.columns) | ||
| return pd.concat(self._cached_batches, ignore_index=True) | ||
|
|
||
| def _reset_batch_cache(self) -> None: | ||
| """Resets batch caching attributes.""" | ||
| self._cached_batches = [] | ||
| self._batch_iter = None | ||
| self._all_data_loaded = False | ||
|
|
||
| def _reset_batches_for_new_page_size(self) -> None: | ||
| """Reset the batch iterator when page size changes.""" | ||
| self._batches = self._dataframe._to_pandas_batches(page_size=self.page_size) | ||
|
|
||
| self._cached_batches = [] | ||
| self._batch_iter = None | ||
| self._all_data_loaded = False | ||
| self._reset_batch_cache() | ||
|
|
||
| def _set_table_html(self) -> None: | ||
| """Sets the current html data based on the current page and page size.""" | ||
|
|
@@ -237,6 +267,21 @@ def _set_table_html(self) -> None: | |
| ) | ||
| return | ||
|
|
||
| # Apply sorting if a column is selected | ||
| df_to_display = self._dataframe | ||
| if self.sort_column: | ||
| # TODO(b/463715504): Support sorting by index columns. | ||
| df_to_display = df_to_display.sort_values( | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could we file a bug and a TODO to support sorting by the index column(s) as well? For those, we'd use sort_index.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. filed b/463715504 |
||
| by=self.sort_column, ascending=self.sort_ascending | ||
| ) | ||
|
|
||
| # Reset batches when sorting changes | ||
| if self._last_sort_state != _SortState(self.sort_column, self.sort_ascending): | ||
| self._batches = df_to_display._to_pandas_batches(page_size=self.page_size) | ||
| self._reset_batch_cache() | ||
| self._last_sort_state = _SortState(self.sort_column, self.sort_ascending) | ||
| self.page = 0 # Reset to first page | ||
|
|
||
| start = self.page * self.page_size | ||
| end = start + self.page_size | ||
|
|
||
|
|
@@ -272,8 +317,14 @@ def _set_table_html(self) -> None: | |
| self.table_html = bigframes.display.html.render_html( | ||
| dataframe=page_data, | ||
| table_id=f"table-{self._table_id}", | ||
| orderable_columns=self.orderable_columns, | ||
| ) | ||
|
|
||
| @traitlets.observe("sort_column", "sort_ascending") | ||
| def _sort_changed(self, _change: Dict[str, Any]): | ||
| """Handler for when sorting parameters change from the frontend.""" | ||
| self._set_table_html() | ||
|
|
||
| @traitlets.observe("page") | ||
| def _page_changed(self, _change: Dict[str, Any]) -> None: | ||
| """Handler for when the page number is changed from the frontend.""" | ||
|
|
@@ -288,6 +339,9 @@ def _page_size_changed(self, _change: Dict[str, Any]) -> None: | |
| return | ||
| # Reset the page to 0 when page size changes to avoid invalid page states | ||
| self.page = 0 | ||
| # Reset the sort state to default (no sort) | ||
| self.sort_column = "" | ||
| self.sort_ascending = True | ||
|
|
||
| # Reset batches to use new page size for future data fetching | ||
| self._reset_batches_for_new_page_size() | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we considered having multiple columns as a possibility? I think a single column is a good starting point, but I think it's an alternative worth considering, especially when a particular column contains lots of duplicate values, like a "date" column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that multi-column sorting is particularly valuable when a column has many duplicate values. I would like to get the single column sorting checked in first as a PR. Then check in a second PR for multi-column sorting. This current PR is already complex enough. I prefer two separate PRs as enhancements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, separate PR makes sense to me, thanks.