-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG SparseDataFrame with dense Series (#19374) #19377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG SparseDataFrame with dense Series (#19374) #19377
Conversation
def test_constructor_from_unknown_type(self): | ||
class Unknown: | ||
pass | ||
pytest.raises(TypeError, SparseDataFrame, Unknown()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's check error message for all of your pytest.raises
calls.
@@ -199,6 +199,31 @@ def test_constructor_from_series(self): | |||
# without sparse value raises error | |||
# df2 = SparseDataFrame([x2_sparse, y]) | |||
|
|||
def test_constructor_from_dense_series(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference issue number under all of your added tests.
@datapythonista : Looks pretty good so far. Don't forget to add a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a whatsnew note as well.
x = Series(np.random.randn(10000), name='a') | ||
assert isinstance(x, Series) | ||
df = SparseDataFrame(x) | ||
assert isinstance(df, SparseDataFrame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
construct an expected SDF and compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
construct a DataFrame and use .to_sparse() to construct an expected frame. we do not want to do all of these little checks, we already have well established comparison functions, e.g. tm.assert_sparse_equal
for this
assert isinstance(df, SparseDataFrame) | ||
assert df.columns == ['b'] | ||
|
||
# No column name available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for the rest
pandas/core/sparse/frame.py
Outdated
@@ -95,6 +95,13 @@ def __init__(self, data=None, index=None, columns=None, default_kind=None, | |||
dtype=dtype, copy=copy) | |||
elif isinstance(data, DataFrame): | |||
mgr = self._init_dict(data, data.index, data.columns, dtype=dtype) | |||
elif isinstance(data, Series): | |||
if columns is None and data.name is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a test to hit this case
Thanks a lot for the comments. Sorry about the whatsnew, I added it when opening the PR, but forgot to add it to the commit. @jreback, I didn't find a way to construct the expected Addressed all the other comments, let me know if you see anything else. Thanks! |
pandas/core/sparse/frame.py
Outdated
elif len(columns) != 1: | ||
raise ValueError('columns must be of length one ' | ||
'if data is of type Series') | ||
mgr = self._init_dict(data.to_frame(columns[0]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't construct like this, actually make dict
x = Series(np.random.randn(10000), name='a') | ||
assert isinstance(x, Series) | ||
df = SparseDataFrame(x) | ||
assert isinstance(df, SparseDataFrame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
construct a DataFrame and use .to_sparse() to construct an expected frame. we do not want to do all of these little checks, we already have well established comparison functions, e.g. tm.assert_sparse_equal
for this
pandas/core/sparse/frame.py
Outdated
if columns is None: | ||
if data.name is None: | ||
raise ValueError('cannot pass a series ' | ||
'w/o a name or columns') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need any of this, you can simply all to_manager
with the column (in a list)
Thanks for the feedback @jreback, I misunderstood what the columns argument had to do, and I was overcomplicating the code. Now it simply creates the I'm not 100% sure I understood what you mean by "actually contruct dic", please let me know if this new version doesn't address this too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small edits, other lgtm. ping on green.
# GH 19393 | ||
# series with name | ||
x = Series(np.random.randn(10000), name='a') | ||
assert isinstance(x, Series) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need this assert
# series with name | ||
x = Series(np.random.randn(10000), name='a') | ||
assert isinstance(x, Series) | ||
res = SparseDataFrame(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer result and expected
x = Series(np.random.randn(10000), name='a') | ||
assert isinstance(x, Series) | ||
res = SparseDataFrame(x) | ||
assert isinstance(res, SparseDataFrame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the assert
# series with no name | ||
x = Series(np.random.randn(10000)) | ||
assert isinstance(x, Series) | ||
res = SparseDataFrame(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same w.r.t naming & asserts
…r, and providing useful error messages for other types (#19374)
Thanks once again for the comments @jreback, seems that I followed the conventions of an old test. Addressed your comments, should be all right now. |
thanks @datapythonista |
git diff upstream/master -u -- "*.py" | flake8 --diff