Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mako.cmd.cmdline() reads template as binary but writes as text resulting in garbled newlines on Windows #376

Open
ni-balexand opened this issue Feb 15, 2023 · 9 comments

Comments

@ni-balexand
Copy link

The issue here can be demonstrated by running the following script on Windows:

import mako.cmd
import sys

with open("template_file", "w") as f:
    print("Line 1", file=f)
    print("Line 2", file=f)
    print("Line 3", file=f)

sys.argv.append("--output-file")
sys.argv.append("rendered_file")
sys.argv.append("template_file")

assert sys.argv[1:] == ["--output-file", "rendered_file", "template_file"]

# pass sys.argv off to mako
mako.cmd.cmdline()

# awkward \r\r\n EOL sequences is what is written to disk
with open("rendered_file", "rb") as f:
    rendered = f.read()
assert rendered == b'Line 1\r\r\nLine 2\r\r\nLine 3\r\r\n'

# If you were to open/read in text mode it shows up as two newlines
with open("rendered_file", "rt") as f:
    rendered = f.read()
assert rendered == 'Line 1\n\nLine 2\n\nLine 3\n\n'

It's odd that read_file() in util.py opens / reads in binary mode but then cmdline() in cmd.py opens / writes in text mode. The whole point of using text mode should be to convert EOLs on Windows, but it does not do this correctly.

ni-balexand added a commit to ni-balexand/mako that referenced this issue Feb 15, 2023
@zzzeek
Copy link
Member

zzzeek commented Feb 15, 2023

hi -

why would you not simply fix the file to have proper CR/NL combinations? (edit: OK I thought that was the input file, it's the output)

this part:

 assert rendered == b'Line 1\r\r\nLine 2\r\r\nLine 3\r\r\n'

looks wrong?

I'd really like to use Python's text mode here and not go back to a Python 2-ish approach.

@zzzeek
Copy link
Member

zzzeek commented Feb 15, 2023

I guess that written file is the bug. I'm not really understanding the problem yet. I dont want to write "b" at all.

@sqla-tester
Copy link
Collaborator

Mike Bayer has proposed a fix for this issue in the main branch:

switch to textmode in all cases https://gerrit.sqlalchemy.org/c/sqlalchemy/mako/+/4450

@zzzeek
Copy link
Member

zzzeek commented Feb 15, 2023

see above gerrit. does not work yet, as we have some super legacy BOM stuff I'd have to figure out and/or just remove.

the "reading bytes" thing is strictly python 2. if it's causing problems, it's out. I want to use the most modern Python features

@zzzeek
Copy link
Member

zzzeek commented Feb 15, 2023

OK I understand the issue now. here is the fix:

diff --git a/mako/cmd.py b/mako/cmd.py
index 93a6733..deb8b95 100755
--- a/mako/cmd.py
+++ b/mako/cmd.py
@@ -86,12 +86,12 @@ def cmdline(argv=None):
 
     kw = dict(varsplit(var) for var in options.var)
     try:
-        rendered = template.render(**kw)
+        rendered = template.render_unicode(**kw)
     except:
         _exit()
     else:
         if output_file:
-            open(output_file, "wt", encoding=output_encoding).write(rendered)
+            open(output_file, "wt", newline="", encoding=output_encoding).write(rendered)
         else:
             sys.stdout.write(rendered)
 

we have no need for text mode to convert newlines so we turn it off.

can you confirm this fixes all issues on your end? thanks

@ni-balexand
Copy link
Author

Yes, that fixes the issue.

I get the confusion now. Opening as text vs binary has several effects including:

  • whether read/write operates with str or bytes types
  • any necessary encoding / decoding
  • newline conversion

Opening in text mode with newline="" will turn off newline conversion while leaving the character encoding alone, and continuing to use the str type for read/write operations. It looks like render_unicode will return a str type just as render did.

@ni-balexand
Copy link
Author

I don't see any specific reason why we would need to change render to render_unicode as far as the newline problem I'm looking at (reverting back to just render still looks fine).

Do you think changing to render_unicode is necessary for other string encoding reasons?

@zzzeek
Copy link
Member

zzzeek commented Feb 15, 2023

render() returns either bytes or string while render_unicode always returns string.

@zzzeek
Copy link
Member

zzzeek commented Feb 15, 2023

right it's not strictly related but the way it is is technically wrong. eventually typing will be added here and it will change anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants