Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use utf-8 as default encoding to read files #992

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dobesv
Copy link

@dobesv dobesv commented Apr 3, 2024

The encoding set on the transform is technically supposed to be used for the chunks going through the transform itself, and it's not necessarily set for a Transform, especially in object mode. This falls back on utf-8 as the encoding when loading a file from disk.

I suspect this should just always use utf-8 always but I can't say for sure there isn't a use case for using the encoding parameter.

The encoding set on the transform is technically supposed to be used for the chunks going through the transform itself, and it's not necessarily set for a Transform, especially in object mode.  This falls back on `utf-8` as the encoding when loading a file from disk.

I suspect this should just always use `utf-8` always but I can't say for sure there isn't a use case for using the `encoding` parameter.
@dobesv
Copy link
Author

dobesv commented Apr 3, 2024

Fixes #993

@karellm
Copy link
Member

karellm commented Apr 6, 2024

Based on the docs, passing the encoding option changes the output from buffer to string. Can you better describe what has changed in node 20.12 ?

@dobesv
Copy link
Author

dobesv commented Apr 9, 2024

To be honest I don't know what exactly they changed in node v20.12. I couldn't figure it out from their release notes.

However, by switching versions and running in the debugger I found that in the earlier version, the encoding parameter to this function would be set to utf-8 but in node v20.12 and later the encoding was undefined.

I suppose that seems right, in a way, since the stream is in object mode there's no reason for it to have an encoding.

If you look a few lines up from my change you can see that content is expected to be a string here, e.g.

      content = file.contents.toString('utf8')

When it is not a string, I get errors in the typescript parser:

s.codePointAt is not a function
    TypeError: s.codePointAt is not a function
        at codePointAt (xxx/.yarn/cache/typescript-patch-32ada147aa-5659316360.zip/node_modules/typescript/lib/typescript.js:12414:81)
        at Object.scan (xxx/.yarn/cache/typescript-patch-32ada147aa-5659316360.zip/node_modules/typescript/lib/typescript.js:11509:26)

That's because s was passed as a Buffer instead of a string, and Buffer doesn't have a codePointAt method on it. By ensuring that we always provide an encoding, we ensure that s is a string and thus it will implement the interface expected by the TypeScript parser.

@dobesv
Copy link
Author

dobesv commented Apr 9, 2024

As I mentioned in the description I suspect the change maybe should be:

      content = fs.readFileSync(file.path, 'utf8')

I'm not sure why exactly the incoming encoding is relevant to the source file encoding. But maybe there's some use case for that, like in gulp or something. I don't know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants