fix percent escape not working when not at the beginning of the line #383

cocolato · 2024-01-17T11:02:29Z

Fixes: #323

The Lexer now will generate wrong result when there is no \n before %%.

Case:

from mako.template import Template


template = """%% do something
%%% do something
if <some condition>:
    %%%% do something
        """


print(Template(template).render())

The result before fix:

%% do something
%% do something
if <some condition>:
   %%%% do something

The result after fix:

% do something
%% do something
if <some condition>:
   %%% do something

cocolato · 2024-01-17T11:15:44Z

Hi, @zzzeek can you please take a look? The fix passed all the tests on my local machine。

zzzeek · 2024-01-17T13:51:38Z

this does change the output of rendering so I'm a little concerned about old templates relying on the broken behavior. I can just put it out there as 1.3.1 and see what happens

sqla-tester

OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision 773b6bd of this pull request into gerrit so we can run tests and reviews and stuff

sqla-tester · 2024-01-17T13:51:49Z

New Gerrit review created for change 773b6bd: https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5111

zzzeek

I have some simplifications to the regex, can you try what I suggested ? the tests all pass

zzzeek · 2024-01-17T14:32:29Z

mako/lexer.py

@@ -357,10 +357,12 @@ def match_text(self):
            r"""
                (.*?)         # anything, followed by:
                (
-                 (?<=\n)(?=[ \t]*(?=%|\#\#)) # an eval or line-based
+                 ((?<=\n)|^)(?=[ \t]*(?=%|\#\#)) # an eval or line-based


i dont think the |^ is needed

also we can remove the "%" from this line

Thanks for the review ! I think the '%' here cannot be removed. Doing so would cause the regex in match_text to fail to consume the newline character '\n' in cases such as \n %for. Removal would result in the match_control_line's regular expression failing to recognize these control keywords.

mako/mako/lexer.py

Lines 424 to 429 in dc66614

def match_control_line(self):

match = self.match(

r"(?<=^)[\t ]*(%(?!%)|##)[\t ]*((?:(?:\\\r?\n)|[^\r\n])*)"

r"(?:\r?\n|\Z)",

re.M,

)

zzzeek · 2024-01-17T14:34:57Z

mako/lexer.py

                                             # comment preceded by a
                                             # consumed newline and whitespace
                 |
+                 (?<!%)(?=%%+)


this can read:

(?<!%)(?=%%+) # consume the first percent sign out of a group of percent signs

so the overall block looks like:

match = self.match( r""" (.*?) # anything, followed by: ( (?<=\n)(?=[ \t]*(?=\#\#)) # an eval or line-based # comment, preceded by a # consumed newline and whitespace | (?<!%)(?=%%+) # consume the first percent sign out of a group of percent signs | (?=\${) # an expression | (?=</?[%&]) # a substitution or block or call start or end # - don't consume | (\\\r?\n) # an escaped newline - throw away | \Z # end of string )""", re.X | re.S, )

cocolato · 2024-01-18T03:40:05Z

At the same time, I found that the current fix is incorrect and generates results that are missing spaces in some escape percent cases.

case：

from mako.template import Template

# 4 spaces before %
template = """
    %%%% do something
"""
print(Template(template).render())

result only has 3 spaces before "%":

   %%% do something

cocolato · 2024-01-18T03:45:26Z

The logic of the code here will consume one spcae:

mako/mako/lexer.py

Lines 360 to 362 in dc66614

    
                            (?<=\n)(?=[ \t]*(?=%|\#\#)) # an eval or line-based 
        
                                                        # comment preceded by a 
        
                                                        # consumed newline and whitespace

mako/mako/lexer.py

Lines 74 to 75 in dc66614

    
           (start, end) = match.span() 
        
           self.match_position = end + 1 if end == start else end

cocolato · 2024-01-18T03:54:49Z

This regex block fixes the issues mentioned above and works correctly.
(?<=\n)(?=[ \t]*(?=%|\#\#))
rewrite to
(?<=\n)(?=[ \t]*(?=%(?!%)|\#\#))

The regex block now:

        match = self.match(
            r"""
                (.*?)         # anything, followed by:
                (
                 (?<=\n)(?=[ \t]*(?=%(?!%)|\#\#)) # an eval or line-based
                                             # comment preceded by a
                                             # consumed newline and whitespace
                 |
                 (?<!%)(?=%%+) # consume the first percent sign out of a group of percent signs
                 |
                 (?=\${)      # an expression
                 |
                 (?=</?[%&])  # a substitution or block or call start or end
                              # - don't consume
                 |
                 (\\\r?\n)    # an escaped newline  - throw away
                 |
                 \Z           # end of string
                )""",
            re.X | re.S,
        )

However, there might be room for improvement, as I am not very familiar with regular expressions.

zzzeek

can you make those changes that you suggested and then also add a test to confirm the case that you found wasn't working? update here and I can then update the gerrit, thanks

cocolato · 2024-01-19T03:18:30Z

The test code and regex has been updated.

sqla-tester

OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision 5d3e742 of this pull request into gerrit so we can run tests and reviews and stuff

sqla-tester · 2024-01-19T13:28:02Z

Patchset 5d3e742 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5111

zzzeek · 2024-01-19T13:30:47Z

However, there might be room for improvement, as I am not very familiar with regular expressions.

you're working at expert level already so consider yourself familiar!

sqla-tester

OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision f282029 of this pull request into gerrit so we can run tests and reviews and stuff

sqla-tester · 2024-01-19T14:55:24Z

Patchset f282029 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5111

zzzeek · 2024-01-19T15:27:14Z

for the formatting, use the tooling we have :

cd /path/to/mako
pre-commit install
pre-commit run --all

cocolato · 2024-01-19T15:48:41Z

Thanks, it has been updated!

sqla-tester

OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision ab8e747 of this pull request into gerrit so we can run tests and reviews and stuff

sqla-tester · 2024-01-19T16:11:37Z

Patchset ab8e747 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5111

sqla-tester · 2024-01-22T20:30:26Z

Michael Bayer (zzzeek) wrote:

thank you!

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5111

sqla-tester · 2024-01-22T20:30:28Z

Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5111 has been merged. Congratulations! :)

zzzeek · 2024-01-22T20:49:46Z

mako 1.3.1 is released. now we'll find out if anyone was relying on two percent signs rendering as two of them

zzzeek · 2024-01-25T20:00:29Z

Hi -

I've reverted the change. Can we please try again, adding new tests that test the case in #384

$ pip install mako==1.3.0
...
Successfully installed mako-1.3.0
$ mako-render - <<< foo%%bar
foo%%bar
$ pip install mako==1.3.1
...
Successfully installed mako-1.3.1
$ mako-render - <<< foo%%bar
foo%bar

The change needs to be limited to only percent signs as the first non-whitespace character, not any percent signs.

zzzeek · 2024-01-25T20:00:41Z

Mako 1.3.1 is yanked

sqla-tester

OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision ab8e747 of this pull request into gerrit so we can run tests and reviews and stuff

sqla-tester · 2024-01-25T20:01:18Z

New Gerrit review created for change ab8e747: https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141

sqla-tester · 2024-01-25T20:01:54Z

Michael Bayer (zzzeek) wrote:

please add tests for double percents in the middle of non-whitespace lines

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141

cocolato · 2024-01-26T02:43:25Z

Oh sorry, I will revise and test this part again.

sqla-tester

OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision db93097 of this pull request into gerrit so we can run tests and reviews and stuff

sqla-tester · 2024-01-26T14:01:57Z

Patchset db93097 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141

sqla-tester · 2024-01-26T14:20:51Z

Michael Bayer (zzzeek) wrote:

wow a whole new method. OK. I dont have a lot of time today so ill try to look more closely at this soon.

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141

zzzeek

hi , here's a new patch that preserves the line numbers more accurately and produces fewer artifacts....can you test this and use this for the patch if it's OK?

diff --git a/mako/lexer.py b/mako/lexer.py
index 64ad491..15975b4 100644
--- a/mako/lexer.py
+++ b/mako/lexer.py
@@ -355,23 +355,11 @@ class Lexer:
             return True
 
     def match_percent(self):
-        match = self.match(r"([^%]*?)%(%+)", re.S)
+        match = self.match(r"(?<=^)(\s*)%%(%*)", re.M)
         if match:
-            text = match.group(1)
-            if text:
-                self.append_node(parsetree.Text, text)
-                for char in text[::-1]:  # Look back, check wheither '%'
-                    #  is the first non-whitespace character in a line
-                    if char == "\n":
-                        break
-                    elif char in ("\t", " "):
-                        continue
-                    else:
-                        self.append_node(parsetree.Text, "%")
-                        break
-                self.append_node(parsetree.Text, match.group(2))
-            else:
-                self.append_node(parsetree.Text, "%")
+            self.append_node(
+                parsetree.Text, match.group(1) + "%" + match.group(2)
+            )
             return True
         else:
             return False
diff --git a/test/test_lexer.py b/test/test_lexer.py
index 5bf148d..362ac70 100644
--- a/test/test_lexer.py
+++ b/test/test_lexer.py
@@ -200,10 +200,9 @@ class LexerTest(TemplateTest):
             TemplateNode(
                 {},
                 [
-                    Text("\n\n", (1, 1)),
-                    Text("%", (1, 1)),
-                    Text(" some whatever.\n\n    ", (3, 3)),
-                    Text("%", (3, 3)),
+                    Text("\n\n%", (1, 1)),
+                    Text(" some whatever.\n\n", (3, 3)),
+                    Text("    %", (5, 1)),
                     Text(" more some whatever\n", (5, 7)),
                     ControlLine("if", "if foo:", False, (6, 1)),
                     ControlLine("if", "endif", True, (7, 1)),
@@ -226,9 +225,9 @@ if <some condition>:
                 [
                     Text("%", (1, 1)),
                     Text(" do something\n", (1, 3)),
-                    Text("%%", (1, 3)),
-                    Text(" do something\nif <some condition>:\n    ", (2, 4)),
-                    Text("%%%", (2, 4)),
+                    Text("%%", (2, 1)),
+                    Text(" do something\nif <some condition>:\n", (2, 4)),
+                    Text("    %%%", (4, 1)),
                     Text(" do something\n        ", (4, 9)),
                 ],
             ),
@@ -248,8 +247,7 @@ if <some condition>:
                 [
                     Text("\n", (1, 1)),
                     ControlLine("for", "for i in [1, 2, 3]:", False, (2, 1)),
-                    Text("    ", (3, 1)),
-                    Text("%", (3, 1)),
+                    Text("    %", (3, 1)),
                     Text(" do something ", (3, 7)),
                     Expression("i", [], (3, 21)),
                     Text("\n", (3, 25)),
@@ -269,12 +267,8 @@ bar %% baz
             TemplateNode(
                 {},
                 [
-                    Text("\n", (1, 1)),
-                    Text("%", (1, 1)),
-                    Text(" foo\nbar ", (2, 3)),
-                    Text("%", (2, 3)),
-                    Text("%", (2, 3)),
-                    Text(" baz\n", (3, 7)),
+                    Text("\n%", (1, 1)),
+                    Text(' foo\nbar %% baz\n', (2, 3))
                 ],
             ),
         )

cocolato · 2024-01-26T23:57:16Z

All has been updated . Thanks for taking the time to review !

sqla-tester

OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision ff8ce2e of this pull request into gerrit so we can run tests and reviews and stuff

sqla-tester · 2024-01-29T16:01:25Z

Patchset ff8ce2e added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141

sqla-tester · 2024-01-30T13:25:43Z

Michael Bayer (zzzeek) wrote:

thank you again! we try again. can keep yanking/reverting til it works :)

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141

sqla-tester · 2024-01-30T13:25:46Z

Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141 has been merged. Congratulations! :)

fiendish · 2024-01-31T22:06:29Z

That positive lookbehind appears to do nothing. Your tests still pass if you use just r"^(\s*)%%(%*)"

cocolato · 2024-02-01T03:35:57Z

Ahh, it seems that there is no difference between the two in MULTILINE mode, if the positive lookbehind will affect performance, maybe it should be changed here.

zzzeek · 2024-02-01T13:23:22Z

I believe the positive lookbehind does not get exercised because a pair of percent signs preceded by characters on a line would have been consumed by the "text" regexp. However, if that were not the case, then this regexp would still need to confirm the percent signs occur subsequent to the beginning of a line with only whitespace in between.

>>> import re
>>> x = "    %%  \n   %%   \n hello %%"
>>> re.compile(r"(?<=^)(\s*)%%(%*)", re.M).match(x, 24)
>>> re.compile(r"(\s*)%%(%*)", re.M).match(x, 24)
<re.Match object; span=(24, 27), match=' %%'>

will affect performance,

it would be negligible, and in fact the lookbehind version is slightly faster (note this is one million calls to the match):

>>> re1 = re.compile(r"(?<=^)(\s*)%%(%*)", re.M)
>>> re2 = re.compile(r"(\s*)%%(%*)", re.M)
>>> import timeit
>>> timeit.timeit("re1.match(x, 24)", "from __main__ import x, re1")
0.08006502594798803
>>> timeit.timeit("re2.match(x, 24)", "from __main__ import x, re2")
0.1264837539056316

the overhead of a new python call to "match_percent()" is going to be much more significant than the reg itself, and this is fine. compilation performance is not really at stake here these are all minimal changes.

maybe it should be changed here.

definitely not

fix percent escape && add test

773b6bd

cocolato changed the title ~~fix percent escape && add test~~ fix percent escape not working when not at the beginning of the line Jan 17, 2024

zzzeek requested a review from sqla-tester January 17, 2024 13:51

sqla-tester reviewed Jan 17, 2024

View reviewed changes

zzzeek requested changes Jan 17, 2024

View reviewed changes

zzzeek requested changes Jan 18, 2024

View reviewed changes

fix regex && add test

5d3e742

zzzeek requested a review from sqla-tester January 19, 2024 13:27

sqla-tester reviewed Jan 19, 2024

View reviewed changes

zzzeek requested a review from sqla-tester January 19, 2024 14:55

sqla-tester reviewed Jan 19, 2024

View reviewed changes

format code style

ab8e747

cocolato force-pushed the fix_persent_escape branch from 758ab44 to ab8e747 Compare January 19, 2024 15:47

zzzeek requested a review from sqla-tester January 19, 2024 16:11

sqla-tester reviewed Jan 19, 2024

View reviewed changes

kroeschl mentioned this pull request Jan 25, 2024

%% anywhere in input is now replaced with % #384

Closed

zzzeek reopened this Jan 25, 2024

zzzeek requested a review from sqla-tester January 25, 2024 20:01

sqla-tester reviewed Jan 25, 2024

View reviewed changes

zzzeek requested a review from sqla-tester January 26, 2024 14:01

sqla-tester reviewed Jan 26, 2024

View reviewed changes

zzzeek requested changes Jan 26, 2024

View reviewed changes

don't replace percent when it is not the first non-whitespace character

ff8ce2e

cocolato force-pushed the fix_persent_escape branch from db93097 to ff8ce2e Compare January 26, 2024 23:54

zzzeek requested a review from sqla-tester January 29, 2024 16:01

sqla-tester reviewed Jan 29, 2024

View reviewed changes

sqlalchemy-bot closed this in 1d6c58e Jan 30, 2024

zzzeek mentioned this pull request May 7, 2024

Update parsetree.py removed "?" from for x in re.compile(r"(\${.+})" … #397

Closed

	def match_control_line(self):
	match = self.match(
	r"(?<=^)[\t ](%(?!%)\|##)[\t ]((?:(?:\\\r?\n)\|[^\r\n])*)"
	r"(?:\r?\n\|\Z)",
	re.M,
	)

fix percent escape not working when not at the beginning of the line #383

fix percent escape not working when not at the beginning of the line #383

Conversation

cocolato commented Jan 17, 2024

cocolato commented Jan 17, 2024

zzzeek commented Jan 17, 2024

sqla-tester left a comment

Choose a reason for hiding this comment

sqla-tester commented Jan 17, 2024

zzzeek left a comment

Choose a reason for hiding this comment

zzzeek Jan 17, 2024

Choose a reason for hiding this comment

zzzeek Jan 17, 2024

Choose a reason for hiding this comment

cocolato Jan 18, 2024 • edited

Choose a reason for hiding this comment

zzzeek Jan 17, 2024

Choose a reason for hiding this comment

cocolato commented Jan 18, 2024 • edited

cocolato commented Jan 18, 2024

cocolato commented Jan 18, 2024

zzzeek left a comment

Choose a reason for hiding this comment

cocolato commented Jan 19, 2024

sqla-tester left a comment

Choose a reason for hiding this comment

sqla-tester commented Jan 19, 2024

zzzeek commented Jan 19, 2024

sqla-tester left a comment

Choose a reason for hiding this comment

sqla-tester commented Jan 19, 2024

zzzeek commented Jan 19, 2024

cocolato commented Jan 19, 2024

sqla-tester left a comment

Choose a reason for hiding this comment

sqla-tester commented Jan 19, 2024

sqla-tester commented Jan 22, 2024

sqla-tester commented Jan 22, 2024

zzzeek commented Jan 22, 2024

zzzeek commented Jan 25, 2024

zzzeek commented Jan 25, 2024

sqla-tester left a comment

Choose a reason for hiding this comment

sqla-tester commented Jan 25, 2024

sqla-tester commented Jan 25, 2024

cocolato commented Jan 26, 2024

sqla-tester left a comment

Choose a reason for hiding this comment

sqla-tester commented Jan 26, 2024

sqla-tester commented Jan 26, 2024

zzzeek left a comment

Choose a reason for hiding this comment

cocolato commented Jan 26, 2024

sqla-tester left a comment

Choose a reason for hiding this comment

sqla-tester commented Jan 29, 2024

sqla-tester commented Jan 30, 2024

sqla-tester commented Jan 30, 2024

fiendish commented Jan 31, 2024

cocolato commented Feb 1, 2024

zzzeek commented Feb 1, 2024

cocolato Jan 18, 2024 •

edited

cocolato commented Jan 18, 2024 •

edited