Skip to content

BUG: to_html() with formatters=<list> and max_cols fixed #28183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 16, 2019
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,7 @@ Performance improvements
Bug fixes
~~~~~~~~~

- Bug in :meth:`DataFrame.to_html` when using ``formatters=<list>`` and ``max_cols`` together. (:issue:`25955`)

Categorical
^^^^^^^^^^^
Expand Down
4 changes: 4 additions & 0 deletions pandas/io/formats/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -656,6 +656,10 @@ def _chk_truncate(self) -> None:
frame = concat(
(frame.iloc[:, :col_num], frame.iloc[:, -col_num:]), axis=1
)
# truncate formatter
if is_list_like(self.formatters) and self.formatters:
truncate_fmt = cast(List[Callable], self.formatters)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity what complain was this giving? cast doesn't seem necessary given assignment back to self.formatters but I may be missing something

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually a mypy complain, when running ci/code_checks.sh, about the possibility of self.formatters not been a list.
image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm OK. I think this is somewhat misleading though as is_list_like will return True for Tuples as well, so restricting this to List might not be totally correct (though I can certainly see where one would think that...)

Does changing this annotation to Sequence[Callable] still pass Mypy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's passes! Although it's necessary to import Sequences in typing and change the concat operation from self.formatters = fmt[slice] + fmt[slice] to self.formatters = [*fmt[slice],*fmt[slice]] . May I make a commit with theses changes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm what type fails if you keep the code as is but just change annotation? Want to make sure if we change that we have appropriate test coverage

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to avoid any / all conflicts between annotations, documentation and implementation

makes sense. At the moment we have a conflict between the documentation and the implementation. so that could be confusing matters.

Also, the wording for to_latex is slightly different from to_string and to_html.

they all share the same code.

maybe we should have a def _validate_formatters_kwarg(formatters): since I think the length check is important if this truncation is added.

at the moment, if the list is too short, it raises and if the list is too long it ignores the additional elements.

however, with the slicing from the start and end, weird things might happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabriellm1 in summary..

we want to avoid the cast.

so for now we should probably go with if isinstance(self.formatters, (list, tuple)): instead of is_list_like(self.formatters) and self.formatters

this would be consistent with the check in _get_formatter

however, you would still need to do self.formatters = [*fmt[slice],*fmt[slice]] to satisfy mypy. I can't find a mypy issue for this case, otherwise we could have added a # type: ignore with the issue number.

I think validating the length of the list should be done in this PR, but updating the docs is outside the scope so could be done as a follow-up.

@WillAyd lmk if you disagree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast is gone. Now what about length validation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we only do the length validation (and don't check that the parameter is not a string or that the items are callables) in this PR, then a separate function is probably unnecessary.

so just raise with a message along the lines of "The 'formatters' parameter must be of length equal to the number of columns."?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I got it. The validation would happen in the same place I made the changes? And what did you mean with "raise with a message"?

self.formatters = truncate_fmt[:col_num] + truncate_fmt[-col_num:]
self.tr_col_num = col_num
if truncate_v:
# cast here since if truncate_v is True, max_rows_adj is not None
Expand Down
36 changes: 36 additions & 0 deletions pandas/tests/io/formats/data/html/truncate_formatter.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>A</th>
<th>...</th>
<th>D</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>1_mod</td>
<td>...</td>
<td>4</td>
</tr>
<tr>
<th>1</th>
<td>5_mod</td>
<td>...</td>
<td>8</td>
</tr>
<tr>
<th>2</th>
<td>9_mod</td>
<td>...</td>
<td>12</td>
</tr>
<tr>
<th>3</th>
<td>13_mod</td>
<td>...</td>
<td>16</td>
</tr>
</tbody>
</table>
17 changes: 17 additions & 0 deletions pandas/tests/io/formats/test_to_html.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,23 @@ def test_to_html_truncate(datapath):
assert result == expected


def test_to_html_truncate_formatter(datapath):
# issue-25955
data = [
{"A": 1, "B": 2, "C": 3, "D": 4},
{"A": 5, "B": 6, "C": 7, "D": 8},
{"A": 9, "B": 10, "C": 11, "D": 12},
{"A": 13, "B": 14, "C": 15, "D": 16},
]

df = DataFrame(data)
fmt = lambda x: str(x) + "_mod"
formatters = [fmt, fmt, None, None]
result = df.to_html(formatters=formatters, max_cols=3)
expected = expected_html(datapath, "truncate_formatter")
assert result == expected


@pytest.mark.parametrize(
"sparsify,expected",
[(True, "truncate_multi_index"), (False, "truncate_multi_index_sparse_off")],
Expand Down