|
6 | 6 |
|
7 | 7 | How to use CrateDB together with popular open-source DataFrame libraries. |
8 | 8 |
|
9 | | -(dask)= |
10 | 9 | ## Dask |
11 | | - |
12 | | -:::{rubric} About |
13 | | -::: |
14 | | -[Dask] is a parallel computing library for analytics with task scheduling. |
15 | | -It is built on top of the Python programming language, making it easy to scale |
16 | | -the Python libraries that you know and love, like NumPy, pandas, and scikit-learn. |
17 | | - |
18 | | -```{div} |
19 | | -:style: "float: right" |
20 | | -[{w=180px}](https://www.dask.org/) |
21 | | -``` |
22 | | - |
23 | | -- [Dask DataFrames] help you process large tabular data by parallelizing pandas, |
24 | | - either on your laptop for larger-than-memory computing, or on a distributed |
25 | | - cluster of computers. |
26 | | - |
27 | | -- [Dask Futures], implementing a real-time task framework, allow you to scale |
28 | | - generic Python workflows across a Dask cluster with minimal code changes, |
29 | | - by extending Python's `concurrent.futures` interface. |
30 | | - |
31 | | -```{div} |
32 | | -:style: "clear: both" |
33 | | -``` |
34 | | - |
35 | | -:::{rubric} Learn |
| 10 | +:::{seealso} |
| 11 | +Please navigate to the dedicated page about {ref}`dask`. |
36 | 12 | ::: |
37 | | -- [Guide to efficient data ingestion to CrateDB with pandas and Dask] |
38 | | -- [Efficient batch/bulk INSERT operations with pandas, Dask, and SQLAlchemy] |
39 | | -- [Import weather data using Dask] |
40 | | -- [Dask code examples] |
41 | | - |
42 | 13 |
|
43 | | -(pandas)= |
44 | 14 | ## pandas |
45 | | - |
46 | | -:::{rubric} About |
47 | | -::: |
48 | | - |
49 | | -```{div} |
50 | | -:style: "float: right" |
51 | | -[{w=180px}](https://pandas.pydata.org/) |
52 | | -``` |
53 | | - |
54 | | -[pandas] is a fast, powerful, flexible, and easy-to-use open-source data analysis |
55 | | -and manipulation tool, built on top of the Python programming language. |
56 | | - |
57 | | -Pandas (stylized as pandas) is a software library written for the Python programming |
58 | | -language for data manipulation and analysis. In particular, it offers data structures |
59 | | -and operations for manipulating numerical tables and time series. |
60 | | - |
61 | | -:::{rubric} Data Model |
62 | | -::: |
63 | | -- Pandas is built around data structures called Series and DataFrames. Data for these |
64 | | - collections can be imported from various file formats such as comma-separated values, |
65 | | - JSON, Parquet, SQL database tables or queries, and Microsoft Excel. |
66 | | -- A Series is a 1-dimensional data structure built on top of NumPy's array. |
67 | | -- Pandas includes support for time series, such as the ability to interpolate values |
68 | | - and filter using a range of timestamps. |
69 | | -- By default, a Pandas index is a series of integers ascending from 0, similar to the |
70 | | - indices of Python arrays. However, indices can use any NumPy data type, including |
71 | | - floating point, timestamps, or strings. |
72 | | -- Pandas supports hierarchical indices with multiple values per data point. An index |
73 | | - with this structure, called a "MultiIndex", allows a single DataFrame to represent |
74 | | - multiple dimensions, similar to a pivot table in Microsoft Excel. Each level of a |
75 | | - MultiIndex can be given a unique name. |
76 | | - |
77 | | -```{div} |
78 | | -:style: "clear: both" |
79 | | -``` |
80 | | - |
81 | | -:::{rubric} Learn |
| 15 | +:::{seealso} |
| 16 | +Please navigate to the dedicated page about {ref}`pandas`. |
82 | 17 | ::: |
83 | | -- [Guide to efficient data ingestion to CrateDB with pandas] |
84 | | -- [Importing Parquet files into CrateDB using Apache Arrow and SQLAlchemy] |
85 | | -- [pandas code examples] |
86 | | -- [From data storage to data analysis: Tutorial on CrateDB and pandas] |
87 | 18 |
|
88 | 19 |
|
89 | 20 | ## Polars |
90 | 21 | :::{seealso} |
91 | 22 | Please navigate to the dedicated page about {ref}`polars`. |
92 | 23 | ::: |
93 | | - |
94 | | - |
95 | | -[Apache Arrow]: https://arrow.apache.org/ |
96 | | -[Dask]: https://www.dask.org/ |
97 | | -[Dask DataFrames]: https://docs.dask.org/en/latest/dataframe.html |
98 | | -[Dask Futures]: https://docs.dask.org/en/latest/futures.html |
99 | | -[pandas]: https://pandas.pydata.org/ |
100 | | - |
101 | | -[Dask code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/dask |
102 | | -[Efficient batch/bulk INSERT operations with pandas, Dask, and SQLAlchemy]: https://cratedb.com/docs/python/en/latest/by-example/sqlalchemy/dataframe.html |
103 | | -[From data storage to data analysis: Tutorial on CrateDB and pandas]: https://community.cratedb.com/t/from-data-storage-to-data-analysis-tutorial-on-cratedb-and-pandas/1440 |
104 | | -[Guide to efficient data ingestion to CrateDB with pandas]: https://community.cratedb.com/t/guide-to-efficient-data-ingestion-to-cratedb-with-pandas/1541 |
105 | | -[Guide to efficient data ingestion to CrateDB with pandas and Dask]: https://community.cratedb.com/t/guide-to-efficient-data-ingestion-to-cratedb-with-pandas-and-dask/1482 |
106 | | -[Import weather data using Dask]: https://github.com/crate/cratedb-examples/blob/main/topic/timeseries/dask-weather-data-import.ipynb |
107 | | -[Importing Parquet files into CrateDB using Apache Arrow and SQLAlchemy]: https://community.cratedb.com/t/importing-parquet-files-into-cratedb-using-apache-arrow-and-sqlalchemy/1161 |
108 | | -[pandas code examples]: https://github.com/crate/cratedb-examples/tree/main/by-dataframe/pandas |
0 commit comments