Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: cannot interact with invalid UTF-16 filenames on Windows, even with Buffers #23735

Closed
rossj opened this issue Oct 18, 2018 · 8 comments
Closed
Labels
fs Issues and PRs related to the fs subsystem / file system. help wanted Issues that need assistance from volunteers or PRs that need help to proceed. libuv Issues and PRs related to the libuv dependency or the uv binding. windows Issues and PRs related to the Windows platform.

Comments

@rossj
Copy link

rossj commented Oct 18, 2018

  • Version: 10.12.0
  • Platform: Windows 10 64-bit
  • Subsystem: fs

PR #5616 gave us support for Buffer paths in all fs methods, primarily to allow interacting with files of unknown or invalid file encoding. This helps on UNIX/Linux where filenames are technically just strings of bytes and do not necessarily represent a valid UTF-8 string.

Similarly, on Windows, filenames are just arrays of wchars, and do not necessarily represent a valid UTF-16 string, however the current { encoding: 'buffer' } variety of fs methods do not properly handle this case. Instead, the Buffers that are returned are UTF-8 representations of (potentially losslessly / incorrectly) decoded UTF-16 filenames. Similarly, it's not possible to pass as input Buffers that represent the raw UTF-16 bytes. This leads to the possibility of files that Node can't interact with at all.

Consider the following code that makes a file that doesn't have a proper UTF-16 name. The created file can be seen and interacted with using Windows Explorer and Notepad without issue.

#include "stdafx.h"
#include <iostream>
#include <windows.h>
#include <string>
using namespace std;

int main()
{
       // Junk surrogate pair
   const wchar_t *filename = L"hi\xD801\x0037";
   HANDLE hfile = CreateFileW(filename, GENERIC_READ, 0, NULL, CREATE_NEW, FILE_ATTRIBUTE_NORMAL, NULL);
   return 0;
}

Then, running the following Node code in the same directory shows that the file cannot be accessed:

const fs = require('fs');

const bufs = fs.readdirSync('.\\', { encoding: 'buffer' });
for (const buf of bufs) {
    try {
        const stat = fs.statSync(buf);
        console.log('successfully got stats of: ' + buf.toString('utf8'));
    } catch (err) {
        console.log('error getting stats of: ' + buf.toString('utf8'));
    }
}

The above code produces the following output when run in the same dir as the invalid UTF-16 file:

error getting stats of: hi�7
successfully got stats of: test.js

Refs:
#5616
rust-lang/rust#12056
jprichardson/node-fs-extra#612

@addaleax addaleax added fs Issues and PRs related to the fs subsystem / file system. windows Issues and PRs related to the Windows platform. libuv Issues and PRs related to the libuv dependency or the uv binding. labels Oct 18, 2018
@addaleax
Copy link
Member

I think the issue here is that libuv attempts automatic UTF-8 → UTF-16 conversion for Windows file paths… /cc @nodejs/libuv

@bnoordhuis
Copy link
Member

Correct. From http://docs.libuv.org/en/v1.x/fs.html:

Note: On Windows uv_fs_* functions use utf-8 encoding.

You feed it UTF-8 and libuv takes care of converting it to/from WCHAR.

@seishun
Copy link
Contributor

seishun commented Nov 11, 2018

Perhaps Node.js could override WideCharToMultiByte and MultiByteToWideChar to make libuv use WTF-8 instead of UTF-8?

@Trott
Copy link
Member

Trott commented Nov 21, 2018

@bnoordhuis @seishun @addaleax Should this be labeled blocked as waiting for an upstream fix in libuv? Or help wanted? Or closed as not-a-bug? Or something else?

@seishun
Copy link
Contributor

seishun commented Jan 1, 2019

@Trott This might be possible to fix without changes in libuv, but I would like some input on my idea before I proceed with investigation.

@jasnell jasnell added the help wanted Issues that need assistance from volunteers or PRs that need help to proceed. label Jun 26, 2020
@santigimeno
Copy link
Member

It seems to me this might be already fixed in libuv as libuv/libuv#2970 landed. Is this correct @vtjnash ? If that's the case the next libuv release will have it and when it lands in nodejs this will be fixed.

@vtjnash
Copy link
Contributor

vtjnash commented May 25, 2023

Yes. Might need testing, but that is the expectation as long as nodejs don't have a strictly-validating utf8 check in the way

@bnoordhuis
Copy link
Member

I believe this is fixed now. Closing but holler if it should be reopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. help wanted Issues that need assistance from volunteers or PRs that need help to proceed. libuv Issues and PRs related to the libuv dependency or the uv binding. windows Issues and PRs related to the Windows platform.
Projects
None yet
Development

No branches or pull requests

8 participants