Allow UTF-8 with BOM for features.fea #3495

NightFurySL2001 · 2024-05-01T14:29:57Z

anthrotype · 2024-05-24T15:32:58Z

Lib/fontTools/feaLib/lexer.py

@@ -269,7 +269,7 @@ def make_lexer_(file_or_path):
            fileobj, closing = file_or_path, False
        else:
            filename, closing = file_or_path, True
-            fileobj = open(filename, "r", encoding="utf-8")
+            fileobj = open(filename, "r", encoding="utf-8-sig")


what if the file is a regular utf-8 that does not not start with a BOM? Will this still work?

Yes (as evident by the tests passing): this encoding just let Python skip the BOM mark if present in UTF-8 text, otherwise functionally it's the same as normal utf-8.

https://docs.python.org/3/library/codecs.html#encodings-and-unicode

On decoding utf-8-sig will skip those three bytes if they appear as the first three bytes in the file.

Do note that this is useful for opening text files (especially made by Windows Notepad). Saving as utf-8-sig is not recommended as it will add the BOM mark in which will break compatibility.

I would personally suggest read everything as utf-8-sig for greatest compatibility and save as utf-8 for standardisation.

I would personally suggest read everything as utf-8-sig for greatest compatibility and save as utf-8 for standardisation.

SGTM.

is this the only place where fontTools reads in human-written (potentially MS Notepad edited) text files? probably not. But sure let's merge this if it helps

So far in the UFO building process, only features.fea had caused this problem. Other components are loaded in through plistlib which probably stripped out the BOM by default.

NightFurySL2001 added 2 commits May 1, 2024 22:29

Allow UTF-8 with BOM for features.fea

cc02ada

Allow UTF-8 with BOM for features.fea

80db8cd

anthrotype reviewed May 24, 2024

View reviewed changes

anthrotype merged commit 4193aea into fonttools:main May 30, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow UTF-8 with BOM for features.fea #3495

Allow UTF-8 with BOM for features.fea #3495

NightFurySL2001 commented May 1, 2024

anthrotype May 24, 2024 •

edited

NightFurySL2001 May 25, 2024 •

edited

behdad May 25, 2024

anthrotype May 30, 2024

NightFurySL2001 May 30, 2024

Allow UTF-8 with BOM for features.fea #3495

Allow UTF-8 with BOM for features.fea #3495

Conversation

NightFurySL2001 commented May 1, 2024

anthrotype May 24, 2024 • edited

Choose a reason for hiding this comment

NightFurySL2001 May 25, 2024 • edited

Choose a reason for hiding this comment

behdad May 25, 2024

Choose a reason for hiding this comment

anthrotype May 30, 2024

Choose a reason for hiding this comment

NightFurySL2001 May 30, 2024

Choose a reason for hiding this comment

anthrotype May 24, 2024 •

edited

NightFurySL2001 May 25, 2024 •

edited