Skip to content

add data_columns to doc string #13065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

michaelaye
Copy link
Contributor

@michaelaye michaelaye commented May 3, 2016

How about this for #13061 ?
I don't think pandas has tests for each documented field, correct?
Do you want a whatsnew entry for this? If yes, how does that work? The guidelines don't talk about that.

@jreback
Copy link
Contributor

jreback commented May 3, 2016

I don't think pandas has tests for each documented field, correct?

not sure what you mean by this. pandas has tests for each option of course. doc-strings in the past were not audited as much as currently.

The doc-string should match for the same keyword args. Please change to match.

In [1]: pd.HDFStore.append?
Signature: pd.HDFStore.append(self, key, value, format=None, append=True, columns=None, dropna=None, **kwargs)
Docstring:
Append to Table in file. Node must already exist and be Table
format.

Parameters
----------
key : object
value : {Series, DataFrame, Panel, Panel4D}
format: 'table' is the default
    table(t) : table format
               Write as a PyTables Table structure which may perform
               worse but allow more flexible operations like searching
               / selecting subsets of the data
append       : boolean, default True, append the input data to the
    existing
data_columns : list of columns to create as data columns, or True to
    use all columns
min_itemsize : dict of columns that specify minimum string sizes
nan_rep      : string to use as string nan represenation
chunksize    : size to chunk the writing
expectedrows : expected TOTAL row size of this table
encoding     : default None, provide an encoding for strings
dropna       : boolean, default False, do not write an ALL nan row to
    the store settable by the option 'io.hdf.dropna_table'
Notes
-----
Does *not* check if data being appended overlaps with existing
data in the table, so be careful
File:      ~/pandas/pandas/io/pytables.py
Type:      instancemethod

@jreback jreback added the Docs label May 3, 2016
@@ -1084,6 +1084,9 @@ def to_hdf(self, path_or_buf, key, **kwargs):
/ selecting subsets of the data
append : boolean, default False
For Table formats, append the input data to the existing
data_columns : list of columns to create as data columns, or True to
use all columns. This will create additional indexed columns for
on-disk queries, by default only 'index' and 'columns' are indexed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is 'columns'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same columns as written in your docs? See here: http://pandas.pydata.org/pandas-docs/stable/io.html#querying-a-table

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by default the axes of the object are indexed (e.g. index for Series, index and columns for DataFrame)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, it's just not really clear from the docstring (but it would also take to much space there to explain).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it should be improved in the io docs?

@michaelaye
Copy link
Contributor Author

  1. Do you want me to expand the docstring in .append()? Or want it in .to_hdf() as short as it is in .append()
  2. I cannot find anywhere to_hdf tests that test for it's individual options. Only mentioning of to_hdf I can find is here.

@jreback
Copy link
Contributor

jreback commented May 4, 2016

to_hdf just calls the underlying functions it's all tested directly on HDFStore. to_hdf is just a wrapper around this.

you can expand the .append doc string if u want and add that to to_hdf

@michaelaye
Copy link
Contributor Author

I just noticed that columns in HDFStore.append() is not supported anymore:

        if columns is not None:
            raise TypeError("columns is not a supported keyword in append, "
                            "try data_columns")

Should the append=None not be removed then from the call signature?

@jreback
Copy link
Contributor

jreback commented May 4, 2016

@michaelaye what does columns have to do with append? (that is an older / deprecated kw)

@michaelaye
Copy link
Contributor Author

That's how it looks like, yes. But it's still there, in the method signature, see above, where you copied it in yourself.

@codecov-io
Copy link

codecov-io commented May 4, 2016

Current coverage is 84.14%

Merging #13065 into master will decrease coverage by <.01%

@@             master     #13065   diff @@
==========================================
  Files           138        137     -1   
  Lines         50587      50261   -326   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          42593      42288   -305   
+ Misses         7994       7973    -21   
  Partials          0          0          

Powered by Codecov. Last updated by b4e2d34...8a36c36

@@ -1084,6 +1084,9 @@ def to_hdf(self, path_or_buf, key, **kwargs):
/ selecting subsets of the data
append : boolean, default False
For Table formats, append the input data to the existing
data_columns : list of columns to create as data columns, or True to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_columns : list of columns, or True, default None
       ...explanation on next line

@michaelaye
Copy link
Contributor Author

michaelaye commented May 6, 2016

What about the obsolete "columns" keyword in the Store.append call()? Different problem I guess, but I don't really understand why to keep it? Not at least for backwards compatibility, because it raises a TypeError now, and so it would if the keyword would be removed and somebody still would try to use it.

@jreback
Copy link
Contributor

jreback commented May 6, 2016

its to provide a custom message with a hint, meaning columns was passed, but maybe you meant data_columns.

columns is a valid kw for .select, so people might try it with append.

@jorisvandenbossche
Copy link
Member

This looks good to me. @jreback ?

@jorisvandenbossche jorisvandenbossche added this to the 0.18.2 milestone May 11, 2016
data_columns : list of columns, or True, default None
This will create additional indexed columns for on-disk queries,
by default only 'index' and 'columns' are indexed. True will index
all columns.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only true for a data frame (and not true for other objects); by default the axes are indexes. e.g. index and columns might be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm sorry, it's not clear what exactly you want changed how.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a generic doc-string that applies when you are looknig at Series,DataFrame, or Panel. so it should either be somewhat generic or specific to that type (which is harder and not worth it). You are saying 'index' and 'columns', which don't apply to Series/Panel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we make this specific to Series,DataFrame and Panel? Because considering the complexity of pandas I find it highly important to have the docstrings as useful as possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case it might not be easy to do this because someone is doing:

``HDFStore.append, there isn't any object at that point, so you can just give an example e.g. Series -> 'index' is defined, DataFrame -> 'index' and 'columns' ,Panel -> 'items', 'major_axis','minor_axis'` (put them in bullet points).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, for a Panel I actually can not add columns to the index?
If that's true, how about then I add the docstring like so:

data_columns : only applicable to DataFrames, see examples

and then I add the usage examples with the suggestions you made above?

@jreback
Copy link
Contributor

jreback commented May 25, 2016

can you rebase / update?

@michaelaye
Copy link
Contributor Author

I see the problem now, maybe this is the reason why the docstring was left out in the first place? What should I do? I guess the data_columns option as actually being ignored if it would be used with a Series.to_hdf? I just tried to use it with a non-existing column name and no error appeared.

@jreback
Copy link
Contributor

jreback commented Jun 30, 2016

can you update

@jreback jreback removed this from the 0.18.2 milestone Jun 30, 2016
@michaelaye
Copy link
Contributor Author

I repeat my previous question from May 25 (under the code discussion at the change overview), for which I still am waiting for guidance:

"So, for a Panel I actually can not add columns to the index?
If that's true, how about then I add the docstring like so:

data_columns : only applicable to DataFrames, see examples

and then I add the usage examples with the suggestions you made above?

@jreback
Copy link
Contributor

jreback commented Jul 1, 2016

@michaelaye I dont' think can use a shared doc here as this is a doc-string on the class (and not on an object yet). So put a more general doc-string should be enough (e.g. give an example for series, dataframe, panel for which axes you can use).

@jorisvandenbossche
Copy link
Member

@michaelaye I cherry-picked your commit to combine it with the changes of the other merged PR

@jorisvandenbossche
Copy link
Member

-> rebased in #14046

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: DataFrame.to_hdf() docstring does not mention data_columns
4 participants