@@ -51,37 +51,24 @@ familiar with some of these internal details, particular around performance and
51
51
memory use, and so the degree to which users are impacted will vary quite a
52
52
lot.
53
53
54
- Key areas of work
55
- =================
56
-
57
- Possible changes or improvements to pandas's internals fall into a number of
58
- different buckets to be explored in great detail:
59
-
60
- * **Decoupling from NumPy while preserving interoperability **: by eliminating
61
- the presumption that pandas objects internally must contain data stored in
62
- NumPy ``ndarray `` objects, we will be able to bring more consistency to
63
- pandas's semantics and enable the core developers to extend pandas more
64
- cleanly with new data types, data structures, and computational semantics.
65
- * **Exposing a pandas Cython and/or C/C++ API to other Python library
66
- developers **: the internals of Series and DataFrame are only weakly
67
- accessible in other developers' native code. At minimum, we wish to better
68
- enable developers to construct the precise data structures / memory
69
- representation that fill the insides of Series and DataFrame.
70
- * **Improving user control and visibility of memory use **: pandas's memory use,
71
- as a result of its internal implementation, can frequently be opaque to the
72
- user or outright unpredictable.
73
- * **Improving performance and system utilization **: We aim to improve both the
74
- micro (operations that take < 1 ms) and macro (all other operations)
75
- performance of pandas across the board. As part of this, we aim to make it
76
- easier for pandas's core developers to leverage multicore systems to
77
- accelerate computations (without running into any of Python's well-known
78
- concurrency limitations)
79
- * **Removal of deprecated / underutilized functionality **: As the Python data
80
- ecosystem has grown, a number of areas of pandas (e.g. plotting and datasets
81
- with more than 2 dimensions) may be better served by other open source
82
- projects. Also, functionality that has been explicitly deprecated or
83
- discouraged from use (like the ``.ix `` indexing operator) would ideally be
84
- removed.
54
+ Goals
55
+ =====
56
+
57
+ Some high levels goals of the pandas 2.0 plan include the following:
58
+
59
+ * Fixing long-standing limitations or inconsistencies in missing data: null
60
+ values in integer and boolean data, and a more consistent notion of null /
61
+ NA.
62
+ * Improved performance and utilization of multicore systems
63
+ * Better user control / visibility of memory usage (which can be opaque and
64
+ difficult to conttrol)
65
+ * Clearer semantics around non-NumPy data types, and permitting new pandas-only
66
+ data types to be added
67
+ * Exposing a "libpandas" C/C++ API to other Python library developers: the
68
+ internals of Series and DataFrame are only weakly accessible in other
69
+ developers' native code. This has been a limitation for scikit-learn and
70
+ other projects requiring C or Cython-level access to pandas object data.
71
+ * Removal of deprecated functionality
85
72
86
73
Non-goals / FAQ
87
74
===============
0 commit comments