Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix percent escape not working when not at the beginning of the line #383

Closed
wants to merge 4 commits into from

Conversation

cocolato
Copy link
Contributor

Fixes: #323

The Lexer now will generate wrong result when there is no \n before %%.

Case:

from mako.template import Template


template = """%% do something
%%% do something
if <some condition>:
    %%%% do something
        """


print(Template(template).render())

The result before fix:

%% do something
%% do something
if <some condition>:
   %%%% do something

The result after fix:

% do something
%% do something
if <some condition>:
   %%% do something

@cocolato cocolato changed the title fix percent escape && add test fix percent escape not working when not at the beginning of the line Jan 17, 2024
@cocolato
Copy link
Contributor Author

Hi, @zzzeek can you please take a look? The fix passed all the tests on my local machine。

@zzzeek
Copy link
Member

zzzeek commented Jan 17, 2024

this does change the output of rendering so I'm a little concerned about old templates relying on the broken behavior. I can just put it out there as 1.3.1 and see what happens

Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision 773b6bd of this pull request into gerrit so we can run tests and reviews and stuff

@sqla-tester
Copy link
Collaborator

New Gerrit review created for change 773b6bd: https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5111

Copy link
Member

@zzzeek zzzeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some simplifications to the regex, can you try what I suggested ? the tests all pass

mako/lexer.py Outdated
@@ -357,10 +357,12 @@ def match_text(self):
r"""
(.*?) # anything, followed by:
(
(?<=\n)(?=[ \t]*(?=%|\#\#)) # an eval or line-based
((?<=\n)|^)(?=[ \t]*(?=%|\#\#)) # an eval or line-based
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i dont think the |^ is needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we can remove the "%" from this line

Copy link
Contributor Author

@cocolato cocolato Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review ! I think the '%' here cannot be removed. Doing so would cause the regex in match_text to fail to consume the newline character '\n' in cases such as \n %for. Removal would result in the match_control_line's regular expression failing to recognize these control keywords.

mako/mako/lexer.py

Lines 424 to 429 in dc66614

def match_control_line(self):
match = self.match(
r"(?<=^)[\t ]*(%(?!%)|##)[\t ]*((?:(?:\\\r?\n)|[^\r\n])*)"
r"(?:\r?\n|\Z)",
re.M,
)

mako/lexer.py Outdated
# comment preceded by a
# consumed newline and whitespace
|
(?<!%)(?=%%+)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can read:

                 (?<!%)(?=%%+)  # consume the first percent sign out of a group of percent signs

so the overall block looks like:

        match = self.match(
            r"""
                (.*?)         # anything, followed by:
                (
                 (?<=\n)(?=[ \t]*(?=\#\#)) # an eval or line-based
                                             # comment, preceded by a
                                             # consumed newline and whitespace
                 |
                 (?<!%)(?=%%+)  # consume the first percent sign out of a group of percent signs
                 |
                 (?=\${)      # an expression
                 |
                 (?=</?[%&])  # a substitution or block or call start or end
                              # - don't consume
                 |
                 (\\\r?\n)    # an escaped newline  - throw away
                 |
                 \Z           # end of string
                )""",
            re.X | re.S,
        )

@cocolato
Copy link
Contributor Author

cocolato commented Jan 18, 2024

At the same time, I found that the current fix is incorrect and generates results that are missing spaces in some escape percent cases.

case:

from mako.template import Template

# 4 spaces before %
template = """
    %%%% do something
"""
print(Template(template).render())

result only has 3 spaces before "%":

   %%% do something

@cocolato
Copy link
Contributor Author

The logic of the code here will consume one spcae:

mako/mako/lexer.py

Lines 360 to 362 in dc66614

(?<=\n)(?=[ \t]*(?=%|\#\#)) # an eval or line-based
# comment preceded by a
# consumed newline and whitespace

mako/mako/lexer.py

Lines 74 to 75 in dc66614

(start, end) = match.span()
self.match_position = end + 1 if end == start else end

@cocolato
Copy link
Contributor Author

This regex block fixes the issues mentioned above and works correctly.
(?<=\n)(?=[ \t]*(?=%|\#\#))
rewrite to
(?<=\n)(?=[ \t]*(?=%(?!%)|\#\#))

The regex block now:

        match = self.match(
            r"""
                (.*?)         # anything, followed by:
                (
                 (?<=\n)(?=[ \t]*(?=%(?!%)|\#\#)) # an eval or line-based
                                             # comment preceded by a
                                             # consumed newline and whitespace
                 |
                 (?<!%)(?=%%+) # consume the first percent sign out of a group of percent signs
                 |
                 (?=\${)      # an expression
                 |
                 (?=</?[%&])  # a substitution or block or call start or end
                              # - don't consume
                 |
                 (\\\r?\n)    # an escaped newline  - throw away
                 |
                 \Z           # end of string
                )""",
            re.X | re.S,
        )

However, there might be room for improvement, as I am not very familiar with regular expressions.

Copy link
Member

@zzzeek zzzeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make those changes that you suggested and then also add a test to confirm the case that you found wasn't working? update here and I can then update the gerrit, thanks

@cocolato
Copy link
Contributor Author

The test code and regex has been updated.

Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision 5d3e742 of this pull request into gerrit so we can run tests and reviews and stuff

@sqla-tester
Copy link
Collaborator

Patchset 5d3e742 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5111

@zzzeek
Copy link
Member

zzzeek commented Jan 19, 2024

However, there might be room for improvement, as I am not very familiar with regular expressions.

you're working at expert level already so consider yourself familiar!

Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision f282029 of this pull request into gerrit so we can run tests and reviews and stuff

@sqla-tester
Copy link
Collaborator

Patchset f282029 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5111

@zzzeek
Copy link
Member

zzzeek commented Jan 19, 2024

for the formatting, use the tooling we have :

cd /path/to/mako
pre-commit install
pre-commit run --all

@cocolato
Copy link
Contributor Author

Thanks, it has been updated!

Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision ab8e747 of this pull request into gerrit so we can run tests and reviews and stuff

@sqla-tester
Copy link
Collaborator

Patchset ab8e747 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5111

@sqla-tester
Copy link
Collaborator

Michael Bayer (zzzeek) wrote:

thank you!

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5111

@sqla-tester
Copy link
Collaborator

Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5111 has been merged. Congratulations! :)

@zzzeek
Copy link
Member

zzzeek commented Jan 22, 2024

mako 1.3.1 is released. now we'll find out if anyone was relying on two percent signs rendering as two of them

@zzzeek
Copy link
Member

zzzeek commented Jan 25, 2024

Hi -

I've reverted the change. Can we please try again, adding new tests that test the case in #384

$ pip install mako==1.3.0
...
Successfully installed mako-1.3.0
$ mako-render - <<< foo%%bar
foo%%bar
$ pip install mako==1.3.1
...
Successfully installed mako-1.3.1
$ mako-render - <<< foo%%bar
foo%bar

The change needs to be limited to only percent signs as the first non-whitespace character, not any percent signs.

@zzzeek zzzeek reopened this Jan 25, 2024
@zzzeek
Copy link
Member

zzzeek commented Jan 25, 2024

Mako 1.3.1 is yanked

Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision ab8e747 of this pull request into gerrit so we can run tests and reviews and stuff

@sqla-tester
Copy link
Collaborator

New Gerrit review created for change ab8e747: https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141

@sqla-tester
Copy link
Collaborator

Michael Bayer (zzzeek) wrote:

please add tests for double percents in the middle of non-whitespace lines

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141

@cocolato
Copy link
Contributor Author

Oh sorry, I will revise and test this part again.

Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision db93097 of this pull request into gerrit so we can run tests and reviews and stuff

@sqla-tester
Copy link
Collaborator

Patchset db93097 added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141

@sqla-tester
Copy link
Collaborator

Michael Bayer (zzzeek) wrote:

wow a whole new method. OK. I dont have a lot of time today so ill try to look more closely at this soon.

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141

Copy link
Member

@zzzeek zzzeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi , here's a new patch that preserves the line numbers more accurately and produces fewer artifacts....can you test this and use this for the patch if it's OK?

diff --git a/mako/lexer.py b/mako/lexer.py
index 64ad491..15975b4 100644
--- a/mako/lexer.py
+++ b/mako/lexer.py
@@ -355,23 +355,11 @@ class Lexer:
             return True
 
     def match_percent(self):
-        match = self.match(r"([^%]*?)%(%+)", re.S)
+        match = self.match(r"(?<=^)(\s*)%%(%*)", re.M)
         if match:
-            text = match.group(1)
-            if text:
-                self.append_node(parsetree.Text, text)
-                for char in text[::-1]:  # Look back, check wheither '%'
-                    #  is the first non-whitespace character in a line
-                    if char == "\n":
-                        break
-                    elif char in ("\t", " "):
-                        continue
-                    else:
-                        self.append_node(parsetree.Text, "%")
-                        break
-                self.append_node(parsetree.Text, match.group(2))
-            else:
-                self.append_node(parsetree.Text, "%")
+            self.append_node(
+                parsetree.Text, match.group(1) + "%" + match.group(2)
+            )
             return True
         else:
             return False
diff --git a/test/test_lexer.py b/test/test_lexer.py
index 5bf148d..362ac70 100644
--- a/test/test_lexer.py
+++ b/test/test_lexer.py
@@ -200,10 +200,9 @@ class LexerTest(TemplateTest):
             TemplateNode(
                 {},
                 [
-                    Text("\n\n", (1, 1)),
-                    Text("%", (1, 1)),
-                    Text(" some whatever.\n\n    ", (3, 3)),
-                    Text("%", (3, 3)),
+                    Text("\n\n%", (1, 1)),
+                    Text(" some whatever.\n\n", (3, 3)),
+                    Text("    %", (5, 1)),
                     Text(" more some whatever\n", (5, 7)),
                     ControlLine("if", "if foo:", False, (6, 1)),
                     ControlLine("if", "endif", True, (7, 1)),
@@ -226,9 +225,9 @@ if <some condition>:
                 [
                     Text("%", (1, 1)),
                     Text(" do something\n", (1, 3)),
-                    Text("%%", (1, 3)),
-                    Text(" do something\nif <some condition>:\n    ", (2, 4)),
-                    Text("%%%", (2, 4)),
+                    Text("%%", (2, 1)),
+                    Text(" do something\nif <some condition>:\n", (2, 4)),
+                    Text("    %%%", (4, 1)),
                     Text(" do something\n        ", (4, 9)),
                 ],
             ),
@@ -248,8 +247,7 @@ if <some condition>:
                 [
                     Text("\n", (1, 1)),
                     ControlLine("for", "for i in [1, 2, 3]:", False, (2, 1)),
-                    Text("    ", (3, 1)),
-                    Text("%", (3, 1)),
+                    Text("    %", (3, 1)),
                     Text(" do something ", (3, 7)),
                     Expression("i", [], (3, 21)),
                     Text("\n", (3, 25)),
@@ -269,12 +267,8 @@ bar %% baz
             TemplateNode(
                 {},
                 [
-                    Text("\n", (1, 1)),
-                    Text("%", (1, 1)),
-                    Text(" foo\nbar ", (2, 3)),
-                    Text("%", (2, 3)),
-                    Text("%", (2, 3)),
-                    Text(" baz\n", (3, 7)),
+                    Text("\n%", (1, 1)),
+                    Text(' foo\nbar %% baz\n', (2, 3))
                 ],
             ),
         )

@cocolato
Copy link
Contributor Author

All has been updated . Thanks for taking the time to review !

Copy link
Collaborator

@sqla-tester sqla-tester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, this is sqla-tester setting up my work on behalf of zzzeek to try to get revision ff8ce2e of this pull request into gerrit so we can run tests and reviews and stuff

@sqla-tester
Copy link
Collaborator

Patchset ff8ce2e added to existing Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141

@sqla-tester
Copy link
Collaborator

Michael Bayer (zzzeek) wrote:

thank you again! we try again. can keep yanking/reverting til it works :)

View this in Gerrit at https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141

@sqla-tester
Copy link
Collaborator

Gerrit review https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/5141 has been merged. Congratulations! :)

@fiendish
Copy link

That positive lookbehind appears to do nothing. Your tests still pass if you use just r"^(\s*)%%(%*)"

@cocolato
Copy link
Contributor Author

cocolato commented Feb 1, 2024

Ahh, it seems that there is no difference between the two in MULTILINE mode, if the positive lookbehind will affect performance, maybe it should be changed here.

@zzzeek
Copy link
Member

zzzeek commented Feb 1, 2024

I believe the positive lookbehind does not get exercised because a pair of percent signs preceded by characters on a line would have been consumed by the "text" regexp. However, if that were not the case, then this regexp would still need to confirm the percent signs occur subsequent to the beginning of a line with only whitespace in between.

>>> import re
>>> x = "    %%  \n   %%   \n hello %%"
>>> re.compile(r"(?<=^)(\s*)%%(%*)", re.M).match(x, 24)
>>> re.compile(r"(\s*)%%(%*)", re.M).match(x, 24)
<re.Match object; span=(24, 27), match=' %%'>

will affect performance,

it would be negligible, and in fact the lookbehind version is slightly faster (note this is one million calls to the match):

>>> re1 = re.compile(r"(?<=^)(\s*)%%(%*)", re.M)
>>> re2 = re.compile(r"(\s*)%%(%*)", re.M)
>>> import timeit
>>> timeit.timeit("re1.match(x, 24)", "from __main__ import x, re1")
0.08006502594798803
>>> timeit.timeit("re2.match(x, 24)", "from __main__ import x, re2")
0.1264837539056316

the overhead of a new python call to "match_percent()" is going to be much more significant than the reg itself, and this is fine. compilation performance is not really at stake here these are all minimal changes.

maybe it should be changed here.

definitely not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

%% not working when not at the beginning of the line
4 participants