Use utf-8 as default encoding to read files #992

dobesv · 2024-04-03T22:44:03Z

The encoding set on the transform is technically supposed to be used for the chunks going through the transform itself, and it's not necessarily set for a Transform, especially in object mode. This falls back on utf-8 as the encoding when loading a file from disk.

I suspect this should just always use utf-8 always but I can't say for sure there isn't a use case for using the encoding parameter.

The encoding set on the transform is technically supposed to be used for the chunks going through the transform itself, and it's not necessarily set for a Transform, especially in object mode. This falls back on `utf-8` as the encoding when loading a file from disk. I suspect this should just always use `utf-8` always but I can't say for sure there isn't a use case for using the `encoding` parameter.

dobesv · 2024-04-03T22:54:41Z

Fixes #993

karellm · 2024-04-06T21:55:24Z

Based on the docs, passing the encoding option changes the output from buffer to string. Can you better describe what has changed in node 20.12 ?

dobesv · 2024-04-09T23:15:12Z

To be honest I don't know what exactly they changed in node v20.12. I couldn't figure it out from their release notes.

However, by switching versions and running in the debugger I found that in the earlier version, the encoding parameter to this function would be set to utf-8 but in node v20.12 and later the encoding was undefined.

I suppose that seems right, in a way, since the stream is in object mode there's no reason for it to have an encoding.

If you look a few lines up from my change you can see that content is expected to be a string here, e.g.

      content = file.contents.toString('utf8')

When it is not a string, I get errors in the typescript parser:

s.codePointAt is not a function
    TypeError: s.codePointAt is not a function
        at codePointAt (xxx/.yarn/cache/typescript-patch-32ada147aa-5659316360.zip/node_modules/typescript/lib/typescript.js:12414:81)
        at Object.scan (xxx/.yarn/cache/typescript-patch-32ada147aa-5659316360.zip/node_modules/typescript/lib/typescript.js:11509:26)

That's because s was passed as a Buffer instead of a string, and Buffer doesn't have a codePointAt method on it. By ensuring that we always provide an encoding, we ensure that s is a string and thus it will implement the interface expected by the TypeScript parser.

dobesv · 2024-04-09T23:18:18Z

As I mentioned in the description I suspect the change maybe should be:

      content = fs.readFileSync(file.path, 'utf8')

I'm not sure why exactly the incoming encoding is relevant to the source file encoding. But maybe there's some use case for that, like in gulp or something. I don't know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use utf-8 as default encoding to read files #992

Use utf-8 as default encoding to read files #992

dobesv commented Apr 3, 2024

dobesv commented Apr 3, 2024

karellm commented Apr 6, 2024

dobesv commented Apr 9, 2024 •

edited

dobesv commented Apr 9, 2024

Use utf-8 as default encoding to read files #992

Are you sure you want to change the base?

Use utf-8 as default encoding to read files #992

Conversation

dobesv commented Apr 3, 2024

dobesv commented Apr 3, 2024

karellm commented Apr 6, 2024

dobesv commented Apr 9, 2024 • edited

dobesv commented Apr 9, 2024

dobesv commented Apr 9, 2024 •

edited