Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Deprecate regex=True default in Series.str.replace #36695

Merged
merged 45 commits into from
Oct 10, 2020
Merged
Show file tree
Hide file tree
Changes from 20 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
52f6d78
API: Deprecate regex=True default in Series.str.replace
Sep 28, 2020
3b9bcb2
Format
Sep 28, 2020
c51e837
Stacklevel
Sep 28, 2020
3761625
Test
Sep 28, 2020
7583b7f
Test name
Sep 28, 2020
8e0d1a5
Link discussion
Sep 28, 2020
79e4400
Set regex
Sep 28, 2020
ecc5786
Update
Sep 28, 2020
cf659b0
Merge remote-tracking branch 'upstream/master' into replace-regex-def…
Sep 28, 2020
628cfa3
Update
Sep 28, 2020
f216022
Fix
Sep 28, 2020
e6c79a3
Merge remote-tracking branch 'upstream/master' into replace-regex-def…
Sep 28, 2020
e6b3e0f
Check for str
Sep 28, 2020
e3ce080
Merge remote-tracking branch 'upstream/master' into replace-regex-def…
Sep 28, 2020
752d322
Merge remote-tracking branch 'upstream/master' into replace-regex-def…
Sep 29, 2020
ee67284
Doc updates
Sep 29, 2020
c832a21
Nit
Sep 29, 2020
09b07e6
Silence some warnings
Sep 29, 2020
c8541da
Edit
Sep 29, 2020
bd31857
Edit
Sep 29, 2020
377a2ba
Question mark
Sep 29, 2020
d6b4fbe
Update doc
Sep 29, 2020
84dfe71
Merge remote-tracking branch 'upstream/master' into replace-regex-def…
Sep 30, 2020
5cfbc04
Merge remote-tracking branch 'upstream/master' into replace-regex-def…
Sep 30, 2020
1b53051
Add back
Sep 30, 2020
1931c31
Oops
Sep 30, 2020
20bfe16
Note
Sep 30, 2020
6ff5955
Merge remote-tracking branch 'upstream/master' into replace-regex-def…
Oct 1, 2020
cea44a7
Fix
Oct 1, 2020
4376bca
Merge remote-tracking branch 'upstream/master' into replace-regex-def…
Oct 2, 2020
482d5c3
Merge remote-tracking branch 'upstream/master' into replace-regex-def…
Oct 3, 2020
cd18347
Merge remote-tracking branch 'upstream/master' into replace-regex-def…
Oct 7, 2020
f872011
Update warning
Oct 7, 2020
c0a473e
rst lint
Oct 7, 2020
727986e
Merge remote-tracking branch 'upstream/master' into replace-regex-def…
Oct 8, 2020
f49f778
Merge remote-tracking branch 'upstream/master' into replace-regex-def…
Oct 8, 2020
d3d155a
Merge remote-tracking branch 'upstream/master' into replace-regex-def…
Oct 9, 2020
89013db
Nit + emphasis
Oct 9, 2020
e78017c
regex=False
Oct 9, 2020
0150b2b
Merge remote-tracking branch 'upstream/master' into replace-regex-def…
Oct 10, 2020
e396ce5
Update
Oct 10, 2020
e799b12
fix
Oct 10, 2020
9f7545f
rst fix
Oct 10, 2020
6be90a4
Merge remote-tracking branch 'upstream/master' into replace-regex-def…
Oct 10, 2020
8a4a833
Make a warning
Oct 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
31 changes: 18 additions & 13 deletions doc/source/user_guide/text.rst
Expand Up @@ -262,25 +262,30 @@ i.e., from the end of the string to the beginning of the string:
'', np.nan, 'CABA', 'dog', 'cat'],
dsaxton marked this conversation as resolved.
Show resolved Hide resolved
dtype="string")
s3
s3.str.replace('^.a|dog', 'XX-XX ', case=False)
s3.str.replace('^.a|dog', 'XX-XX ', case=False, regex=True)

Some caution must be taken to keep regular expressions in mind! For example, the
following code will cause trouble because of the regular expression meaning of
``$``:
Some caution must be taken when dealing with regular expressions! The current behavior
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can make this a note / warning

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add a versionchanged tag

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a versionchanged tag. Do we also need one in the docstring?

I'm not sure what's meant by note / warning since I've added a whatsnew note and warning. Do you mean put the text itself in the whatsnew?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no i mean a ..note:: a sphinx-note (which puts a highlite box around this)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also do a ..warning:: which is a red-box and more prominent, but either way want to call out this

Copy link
Member Author

@dsaxton dsaxton Oct 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this a .. warning
image

is to treat single character patterns as literal strings, even when ``regex`` is set
to ``True``. (This behavior is deprecated and will be removed in a future version so
that the ``regex`` keyword is always respected.) For example, the following code will
cause trouble because of the regular expression meaning of ``$``:

.. ipython:: python

# Consider the following badly formatted financial data
dollars = pd.Series(['12', '-$10', '$10,000'], dtype="string")

# This does what you'd naively expect:
dollars.str.replace('$', '')
# Here $ is treated as a literal character
dollars.str.replace('$', '', regex=True)
dsaxton marked this conversation as resolved.
Show resolved Hide resolved

# But this doesn't:
dollars.str.replace('-$', '-')
# But here it is not
dollars.str.replace('-$', '-', regex=True)

# We need to escape the special character (for >1 len patterns)
dollars.str.replace(r'-\$', '-')
dollars.str.replace(r'-\$', '-', regex=True)

# Or set regex equal to False
dollars.str.replace('-$', '-', regex=False)
dsaxton marked this conversation as resolved.
Show resolved Hide resolved

If you do want literal replacement of a string (equivalent to
:meth:`str.replace`), you can set the optional ``regex`` parameter to
Expand All @@ -290,7 +295,7 @@ and ``repl`` must be strings:
.. ipython:: python

# These lines are equivalent
dollars.str.replace(r'-\$', '-')
dollars.str.replace(r'-\$', '-', regex=True)
dollars.str.replace('-$', '-', regex=False)

The ``replace`` method can also take a callable as replacement. It is called
Expand All @@ -306,7 +311,7 @@ positional argument (a regex object) and return a string.
return m.group(0)[::-1]

pd.Series(['foo 123', 'bar baz', np.nan],
dtype="string").str.replace(pat, repl)
dtype="string").str.replace(pat, repl, regex=True)

# Using regex groups
pat = r"(?P<one>\w+) (?P<two>\w+) (?P<three>\w+)"
Expand All @@ -315,7 +320,7 @@ positional argument (a regex object) and return a string.
return m.group('two').swapcase()

pd.Series(['Foo Bar Baz', np.nan],
dtype="string").str.replace(pat, repl)
dtype="string").str.replace(pat, repl, regex=True)

The ``replace`` method also accepts a compiled regular expression object
from :func:`re.compile` as a pattern. All flags should be included in the
Expand All @@ -325,7 +330,7 @@ compiled regular expression object.

import re
regex_pat = re.compile(r'^.a|dog', flags=re.IGNORECASE)
s3.str.replace(regex_pat, 'XX-XX ')
s3.str.replace(regex_pat, 'XX-XX ', regex=True)

Including a ``flags`` argument when calling ``replace`` with a compiled
regular expression object will raise a ``ValueError``.
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/reshape/melt.py
Expand Up @@ -483,7 +483,7 @@ def melt_stub(df, stub: str, i, j, value_vars, sep: str):
var_name=j,
)
newdf[j] = Categorical(newdf[j])
newdf[j] = newdf[j].str.replace(re.escape(stub + sep), "")
newdf[j] = newdf[j].str.replace(re.escape(stub + sep), "", regex=True)

# GH17627 Cast numerics suffixes to int/float
newdf[j] = to_numeric(newdf[j], errors="ignore")
Expand Down
15 changes: 14 additions & 1 deletion pandas/core/strings.py
Expand Up @@ -2837,7 +2837,20 @@ def fullmatch(self, pat, case=True, flags=0, na=np.nan):

@copy(str_replace)
@forbid_nonstring_types(["bytes"])
def replace(self, pat, repl, n=-1, case=None, flags=0, regex=True):
def replace(self, pat, repl, n=-1, case=None, flags=0, regex=None):
if regex is None:
if (
isinstance(pat, str)
and len(pat) > 1
and any(c in pat for c in ".+*|^$[](){}\\")
dsaxton marked this conversation as resolved.
Show resolved Hide resolved
):
# warn only in cases where regex behavior would differ from literal
msg = (
"The default value of regex will change from "
"True to False in a future version."
)
warnings.warn(msg, FutureWarning, stacklevel=3)
regex = True
result = str_replace(
self._parent, pat, repl, n=n, case=case, flags=flags, regex=regex
)
Expand Down
6 changes: 6 additions & 0 deletions pandas/tests/series/methods/test_replace.py
Expand Up @@ -449,3 +449,9 @@ def test_replace_with_compiled_regex(self):
result = s.replace({regex: "z"}, regex=True)
expected = pd.Series(["z", "b", "c"])
tm.assert_series_equal(result, expected)

def test_str_replace_regex_default_raises_warning(self):
# https://github.com/pandas-dev/pandas/pull/24809
s = pd.Series(["a", "b", "c"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check the messages on this warning

with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
s.str.replace("^.$", "")
24 changes: 13 additions & 11 deletions pandas/tests/test_strings.py
Expand Up @@ -995,11 +995,11 @@ def test_casemethods(self):
def test_replace(self):
values = Series(["fooBAD__barBAD", np.nan])

result = values.str.replace("BAD[_]*", "")
result = values.str.replace("BAD[_]*", "", regex=True)
exp = Series(["foobar", np.nan])
tm.assert_series_equal(result, exp)

result = values.str.replace("BAD[_]*", "", n=1)
result = values.str.replace("BAD[_]*", "", n=1, regex=True)
exp = Series(["foobarBAD", np.nan])
tm.assert_series_equal(result, exp)

Expand All @@ -1008,15 +1008,17 @@ def test_replace(self):
["aBAD", np.nan, "bBAD", True, datetime.today(), "fooBAD", None, 1, 2.0]
)

rs = Series(mixed).str.replace("BAD[_]*", "")
rs = Series(mixed).str.replace("BAD[_]*", "", regex=True)
xp = Series(["a", np.nan, "b", np.nan, np.nan, "foo", np.nan, np.nan, np.nan])
assert isinstance(rs, Series)
tm.assert_almost_equal(rs, xp)

# flags + unicode
values = Series([b"abcd,\xc3\xa0".decode("utf-8")])
exp = Series([b"abcd, \xc3\xa0".decode("utf-8")])
result = values.str.replace(r"(?<=\w),(?=\w)", ", ", flags=re.UNICODE)
result = values.str.replace(
r"(?<=\w),(?=\w)", ", ", flags=re.UNICODE, regex=True
)
tm.assert_series_equal(result, exp)

# GH 13438
Expand All @@ -1034,7 +1036,7 @@ def test_replace_callable(self):

# test with callable
repl = lambda m: m.group(0).swapcase()
result = values.str.replace("[a-z][A-Z]{2}", repl, n=2)
result = values.str.replace("[a-z][A-Z]{2}", repl, n=2, regex=True)
exp = Series(["foObaD__baRbaD", np.nan])
tm.assert_series_equal(result, exp)

Expand All @@ -1060,7 +1062,7 @@ def test_replace_callable(self):
values = Series(["Foo Bar Baz", np.nan])
pat = r"(?P<first>\w+) (?P<middle>\w+) (?P<last>\w+)"
repl = lambda m: m.group("middle").swapcase()
result = values.str.replace(pat, repl)
result = values.str.replace(pat, repl, regex=True)
exp = Series(["bAR", np.nan])
tm.assert_series_equal(result, exp)

Expand All @@ -1070,11 +1072,11 @@ def test_replace_compiled_regex(self):

# test with compiled regex
pat = re.compile(r"BAD[_]*")
result = values.str.replace(pat, "")
result = values.str.replace(pat, "", regex=True)
exp = Series(["foobar", np.nan])
tm.assert_series_equal(result, exp)

result = values.str.replace(pat, "", n=1)
result = values.str.replace(pat, "", n=1, regex=True)
exp = Series(["foobarBAD", np.nan])
tm.assert_series_equal(result, exp)

Expand All @@ -1083,7 +1085,7 @@ def test_replace_compiled_regex(self):
["aBAD", np.nan, "bBAD", True, datetime.today(), "fooBAD", None, 1, 2.0]
)

rs = Series(mixed).str.replace(pat, "")
rs = Series(mixed).str.replace(pat, "", regex=True)
xp = Series(["a", np.nan, "b", np.nan, np.nan, "foo", np.nan, np.nan, np.nan])
assert isinstance(rs, Series)
tm.assert_almost_equal(rs, xp)
Expand Down Expand Up @@ -1121,7 +1123,7 @@ def test_replace_literal(self):
# GH16808 literal replace (regex=False vs regex=True)
values = Series(["f.o", "foo", np.nan])
exp = Series(["bao", "bao", np.nan])
result = values.str.replace("f.", "ba")
result = values.str.replace("f.", "ba", regex=True)
tm.assert_series_equal(result, exp)

exp = Series(["bao", "foo", np.nan])
Expand Down Expand Up @@ -3331,7 +3333,7 @@ def test_replace_moar(self):
)
tm.assert_series_equal(result, expected)

result = s.str.replace("^.a|dog", "XX-XX ", case=False)
result = s.str.replace("^.a|dog", "XX-XX ", case=False, regex=True)
expected = Series(
[
"A",
Expand Down