|
| 1 | + |
1 | 2 | .. _io:
|
2 | 3 |
|
3 | 4 | .. currentmodule:: pandas
|
@@ -793,27 +794,151 @@ Objects can be written to the file just like adding key-value pairs to a dict:
|
793 | 794 | major_axis=date_range('1/1/2000', periods=5),
|
794 | 795 | minor_axis=['A', 'B', 'C', 'D'])
|
795 | 796 |
|
| 797 | + # store.put('s', s') is an equivalent method |
796 | 798 | store['s'] = s
|
| 799 | +
|
797 | 800 | store['df'] = df
|
| 801 | +
|
798 | 802 | store['wp'] = wp
|
| 803 | +
|
| 804 | + # the type of stored data |
| 805 | + store.handle.root.wp._v_attrs.pandas_type |
| 806 | +
|
799 | 807 | store
|
800 | 808 |
|
801 | 809 | In a current or later Python session, you can retrieve stored objects:
|
802 | 810 |
|
803 | 811 | .. ipython:: python
|
804 | 812 |
|
| 813 | + # store.get('df') is an equivalent method |
805 | 814 | store['df']
|
806 | 815 |
|
| 816 | +Deletion of the object specified by the key |
| 817 | + |
| 818 | +.. ipython:: python |
| 819 | +
|
| 820 | + # store.remove('wp') is an equivalent method |
| 821 | + del store['wp'] |
| 822 | +
|
| 823 | + store |
| 824 | +
|
| 825 | +.. ipython:: python |
| 826 | + :suppress: |
| 827 | +
|
| 828 | + store.close() |
| 829 | + import os |
| 830 | + os.remove('store.h5') |
| 831 | +
|
| 832 | +
|
| 833 | +These stores are **not** appendable once written (though you can simply remove them and rewrite). Nor are they **queryable**; they must be retrieved in their entirety. |
| 834 | + |
| 835 | + |
| 836 | +Storing in Table format |
| 837 | +~~~~~~~~~~~~~~~~~~~~~~~ |
| 838 | + |
| 839 | +``HDFStore`` supports another ``PyTables`` format on disk, the ``table`` format. Conceptually a ``table`` is shaped |
| 840 | +very much like a DataFrame, with rows and columns. A ``table`` may be appended to in the same or other sessions. |
| 841 | +In addition, delete & query type operations are supported. You can create an index with ``create_table_index`` |
| 842 | +after data is already in the table (this may become automatic in the future or an option on appending/putting a ``table``). |
| 843 | + |
| 844 | +.. ipython:: python |
| 845 | + :suppress: |
| 846 | + :okexcept: |
| 847 | +
|
| 848 | + os.remove('store.h5') |
| 849 | +
|
| 850 | +.. ipython:: python |
| 851 | +
|
| 852 | + store = HDFStore('store.h5') |
| 853 | + df1 = df[0:4] |
| 854 | + df2 = df[4:] |
| 855 | + store.append('df', df1) |
| 856 | + store.append('df', df2) |
| 857 | + store.append('wp', wp) |
| 858 | + store |
| 859 | +
|
| 860 | + store.select('df') |
| 861 | +
|
| 862 | + # the type of stored data |
| 863 | + store.handle.root.df._v_attrs.pandas_type |
| 864 | +
|
| 865 | + store.create_table_index('df') |
| 866 | + store.handle.root.df.table |
| 867 | +
|
| 868 | +.. ipython:: python |
| 869 | + :suppress: |
| 870 | +
|
| 871 | + store.close() |
| 872 | + import os |
| 873 | + os.remove('store.h5') |
| 874 | +
|
| 875 | +
|
| 876 | +Querying objects stored in Table format |
| 877 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 878 | + |
| 879 | +``select`` and ``delete`` operations have an optional criteria that can be specified to select/delete only |
| 880 | +a subset of the data. This allows one to have a very large on-disk table and retrieve only a portion of the data. |
| 881 | + |
| 882 | +A query is specified using the ``Term`` class under the hood. |
| 883 | + |
| 884 | + - 'index' and 'column' are supported indexers of a DataFrame |
| 885 | + - 'major_axis' and 'minor_axis' are supported indexers of the Panel |
| 886 | + |
| 887 | +Valid terms can be created from ``dict, list, tuple, or string``. Objects can be embeded as values. Allowed operations are: ``<, <=, >, >=, =``. ``=`` will be inferred as an implicit set operation (e.g. if 2 or more values are provided). The following are all valid terms. |
| 888 | + |
| 889 | + - ``dict(field = 'index', op = '>', value = '20121114')`` |
| 890 | + - ``('index', '>', '20121114')`` |
| 891 | + - ``'index>20121114'`` |
| 892 | + - ``('index', '>', datetime(2012,11,14))`` |
| 893 | + - ``('index', ['20121114','20121115'])`` |
| 894 | + - ``('major', '=', Timestamp('2012/11/14'))`` |
| 895 | + - ``('minor_axis', ['A','B'])`` |
| 896 | + |
| 897 | +Queries are built up using a list of ``Terms`` (currently only **anding** of terms is supported). An example query for a panel might be specified as follows. |
| 898 | +``['major_axis>20000102', ('minor_axis', '=', ['A','B']) ]``. This is roughly translated to: `major_axis must be greater than the date 20000102 and the minor_axis must be A or B` |
| 899 | + |
| 900 | +.. ipython:: python |
| 901 | +
|
| 902 | + store = HDFStore('store.h5') |
| 903 | + store.append('wp',wp) |
| 904 | + store.select('wp',[ 'major_axis>20000102', ('minor_axis', '=', ['A','B']) ]) |
| 905 | +
|
| 906 | +Delete from objects stored in Table format |
| 907 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 908 | + |
| 909 | +.. ipython:: python |
| 910 | +
|
| 911 | + store.remove('wp', 'index>20000102' ) |
| 912 | + store.select('wp') |
| 913 | +
|
807 | 914 | .. ipython:: python
|
808 | 915 | :suppress:
|
809 | 916 |
|
810 | 917 | store.close()
|
811 | 918 | import os
|
812 | 919 | os.remove('store.h5')
|
813 | 920 |
|
| 921 | +Notes & Caveats |
| 922 | +~~~~~~~~~~~~~~~ |
| 923 | + |
| 924 | + - Selection by items (the top level panel dimension) is not possible; you always get all of the items in the returned Panel |
| 925 | + - ``PyTables`` only supports fixed-width string columns in ``tables``. The sizes of a string based indexing column (e.g. *index* or *minor_axis*) are determined as the maximum size of the elements in that axis or by passing the ``min_itemsize`` on the first table creation. If subsequent appends introduce elements in the indexing axis that are larger than the supported indexer, an Exception will be raised (otherwise you could have a silent truncation of these indexers, leading to loss of information). |
| 926 | + - Once a ``table`` is created its items (Panel) / columns (DataFrame) are fixed; only exactly the same columns can be appended |
| 927 | + - You can not append/select/delete to a non-table (table creation is determined on the first append, or by passing ``table=True`` in a put operation) |
| 928 | + |
| 929 | +Performance |
| 930 | +~~~~~~~~~~~ |
| 931 | + |
| 932 | + - ``Tables`` come with a performance penalty as compared to regular stores. The benefit is the ability to append/delete and query (potentially very large amounts of data). |
| 933 | + Write times are generally longer as compared with regular stores. Query times can be quite fast, especially on an indexed axis. |
| 934 | + - ``Tables`` can (as of 0.10.0) be expressed as different types. |
814 | 935 |
|
815 |
| -.. Storing in Table format |
816 |
| -.. ~~~~~~~~~~~~~~~~~~~~~~~ |
| 936 | + - ``AppendableTable`` which is a similiar table to past versions (this is the default). |
| 937 | + - ``WORMTable`` (pending implementation) - is available to faciliate very fast writing of tables that are also queryable (but CANNOT support appends) |
817 | 938 |
|
818 |
| -.. Querying objects stored in Table format |
819 |
| -.. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 939 | + - To delete a lot of data, it is sometimes better to erase the table and rewrite it. ``PyTables`` tends to increase the file size with deletions |
| 940 | + - In general it is best to store Panels with the most frequently selected dimension in the minor axis and a time/date like dimension in the major axis, but this is not required. Panels can have any major_axis and minor_axis type that is a valid Panel indexer. |
| 941 | + - No dimensions are currently indexed automagically (in the ``PyTables`` sense); these require an explict call to ``create_table_index`` |
| 942 | + - ``Tables`` offer better performance when compressed after writing them (as opposed to turning on compression at the very beginning) |
| 943 | + use the pytables utilities ``ptrepack`` to rewrite the file (and also can change compression methods) |
| 944 | + - Duplicate rows can be written, but are filtered out in selection (with the last items being selected; thus a table is unique on major, minor pairs) |
0 commit comments