-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: added regex argument to Series.str.split #44185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 26 commits
f55c968
e56f8fb
282ef51
2c5402e
609a77f
837427b
7523b1b
e1a0aa7
d7b3d8e
20dc2a6
03eaa90
0b139f3
1604915
8312d79
a82639c
76e6001
2c43fb5
e95416d
2ed7980
5f0d8df
ba812a1
e2da861
2855fa8
ed37375
057dcfb
ece00f1
b6bbf3e
27ffee7
b97ebe9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -308,21 +308,38 @@ def f(x): | |||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
return self._str_map(f) | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
def _str_split(self, pat=None, n=-1, expand=False): | ||||||||||||||||||||||||||||||||||||||
def _str_split( | ||||||||||||||||||||||||||||||||||||||
self, | ||||||||||||||||||||||||||||||||||||||
pat: str | re.Pattern | None = None, | ||||||||||||||||||||||||||||||||||||||
n=-1, | ||||||||||||||||||||||||||||||||||||||
expand=False, | ||||||||||||||||||||||||||||||||||||||
regex: bool | None = None, | ||||||||||||||||||||||||||||||||||||||
): | ||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would be nice to factor this logic of pattern stuff into a common function to share with str_replace There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pandas/pandas/core/strings/object_array.py Lines 152 to 160 in b0992ee
Are you referring to this part? Unfortunately, str_replace and str_split handle arguments quite differently. str_replace has two additional arguments In a set of another PRs, I can do the following to str_split
But for now, I think str_replace and str_split are not similar enough to share a common logic handling function. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
then shouldn't we ad yes there is weird logic by using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did not add the len(pat) logic - it has always been there. I purposely didn't cut it out to maintain current behavior. I think that we should remove it in the future. pandas/pandas/core/strings/object_array.py Lines 317 to 325 in 2fa2d5c
In this PR, I am simply adding the regex flag, as requested by several issues. I can go ahead and add the case and flags if you think that is a good idea.Thanks There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I saw that you didn't change the current There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm happy to get rid of the len(pat) logic and thus the If you guys want me to go ahead, what should the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback what do you think on breaking change here. I would be inclined to set the default to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While I agree that removing this logic would be great, I'd be worried about changing the default to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since this is a new method we could change things, but yeah maybe this is too much for now. ok what i think we should do is this. factor to a common method and use it here. when we deprecate this it will deprecate in both places (yes even though this is a new method it is fine). its at least consistent. |
||||||||||||||||||||||||||||||||||||||
if pat is None: | ||||||||||||||||||||||||||||||||||||||
if n is None or n == 0: | ||||||||||||||||||||||||||||||||||||||
n = -1 | ||||||||||||||||||||||||||||||||||||||
f = lambda x: x.split(pat, n) | ||||||||||||||||||||||||||||||||||||||
else: | ||||||||||||||||||||||||||||||||||||||
if len(pat) == 1: | ||||||||||||||||||||||||||||||||||||||
if n is None or n == 0: | ||||||||||||||||||||||||||||||||||||||
n = -1 | ||||||||||||||||||||||||||||||||||||||
f = lambda x: x.split(pat, n) | ||||||||||||||||||||||||||||||||||||||
new_pat: str | re.Pattern | ||||||||||||||||||||||||||||||||||||||
if regex is True or isinstance(pat, re.Pattern): | ||||||||||||||||||||||||||||||||||||||
new_pat = re.compile(pat) | ||||||||||||||||||||||||||||||||||||||
elif regex is False: | ||||||||||||||||||||||||||||||||||||||
new_pat = pat | ||||||||||||||||||||||||||||||||||||||
# regex is None so link to old behavior #43563 | ||||||||||||||||||||||||||||||||||||||
else: | ||||||||||||||||||||||||||||||||||||||
if len(pat) == 1: | ||||||||||||||||||||||||||||||||||||||
new_pat = pat | ||||||||||||||||||||||||||||||||||||||
else: | ||||||||||||||||||||||||||||||||||||||
new_pat = re.compile(pat) | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
if isinstance(new_pat, re.Pattern): | ||||||||||||||||||||||||||||||||||||||
if n is None or n == -1: | ||||||||||||||||||||||||||||||||||||||
n = 0 | ||||||||||||||||||||||||||||||||||||||
regex = re.compile(pat) | ||||||||||||||||||||||||||||||||||||||
f = lambda x: regex.split(x, maxsplit=n) | ||||||||||||||||||||||||||||||||||||||
f = lambda x: new_pat.split(x, maxsplit=n) | ||||||||||||||||||||||||||||||||||||||
else: | ||||||||||||||||||||||||||||||||||||||
if n is None or n == 0: | ||||||||||||||||||||||||||||||||||||||
n = -1 | ||||||||||||||||||||||||||||||||||||||
f = lambda x: x.split(pat, n) | ||||||||||||||||||||||||||||||||||||||
return self._str_map(f, dtype=object) | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
def _str_rsplit(self, pat=None, n=-1): | ||||||||||||||||||||||||||||||||||||||
|
Uh oh!
There was an error while loading. Please reload this page.