Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treat mount point as symlink-like in finddata2dirent #4383

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jakebailey
Copy link

@jakebailey jakebailey commented Apr 12, 2023

Testing on a pnpm monorepo containing recursive symlinks, it turns out that dwReserved0 is actually IO_REPARSE_TAG_MOUNT_POINT.

Checking FILE_ATTRIBUTE_REPARSE_POINT and IO_REPARSE_TAG_MOUNT_POINT appears to match similar code in mingw.c's mingw_is_mount_point.

The new test fails without this change, but I am unsure whether or not it needs some sort of conditional that checks that symlinks are available in Windows. "SYMLINK" used in the test file appears to be false even though I have developer mode enabled, allowing them.

I'd appreciate some feedback or pointers here; I've never submitted code to git.

Fixes #3358

ref pnpm/pnpm#4401

t/t7300-clean.sh Show resolved Hide resolved
@@ -19,7 +19,7 @@ static inline void finddata2dirent(struct dirent *ent, WIN32_FIND_DATAW *fdata)

/* Set file type, based on WIN32_FIND_DATA */
if ((fdata->dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT)
&& fdata->dwReserved0 == IO_REPARSE_TAG_SYMLINK)
&& fdata->dwReserved0 == IO_REPARSE_TAG_MOUNT_POINT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we agree this is the right thing to do for IO_REPARSE_TAG_MOUNT_POINT, wouldn't we want to keep doing the same thing for IO_REPARSE_TAG_SYMLINK?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably; I'm happy to do both if that sounds correct, though no test changed. There is code that exists in the repo already that appears to treat both equivalently, though, not always.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a drive-by observation, but yeah, I think you'd want to check for either reparse_tag value here -- but I haven't actually tried to dig into this yet. There are several other places in the code where it looks for either type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I appreicate the review. I'll add that on when I get a chance.

I would normally say that such a thing should get extracted to a helper so that everything can evenly decide what is "symlink-like", but truthfully I'm so rusty in C that I don't trust myself to make that work how people'd want 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the PR to allow for both.

Testing on a pnpm monorepo containing recursive symlinks, it turns out
that dwReserved0 is actually IO_REPARSE_TAG_MOUNT_POINT.

Checking FILE_ATTRIBUTE_REPARSE_POINT and IO_REPARSE_TAG_MOUNT_POINT
appears to match similar code in mingw.c's mingw_is_mount_point.

The new test fails without this change, but I am unsure whether or not
it needs some sort of conditional that checks that symlinks are
available in Windows. "SYMLINK" used in the test file appears to be
false even though I have developer mode enabled, allowing them.

Signed-off-by: Jake Bailey <jabaile@microsoft.com>
@jakebailey jakebailey changed the title Check for mount point instead of symlink in finddata2dirent Treat mount point as symlink-like in finddata2dirent Apr 14, 2023
@dscho
Copy link
Member

dscho commented Apr 16, 2023

We had that discussion previously and came to the conclusion that mount points are not symlinks: #437 (comment)

The main take-away is that mount points are a fundamentally different thing than symbolic links and asking Git to treat them alike is prone to cause trouble.

Having said that, a compromise that might work for all involved could be to introduce a new Git attribute (similar to symlink=file) to tell Git to treat an NTFS junction at a given path as if it were a symbolic link. I am not quite certain about all the ramifications of that, though. Moving such worktrees using Explorer quite likely would create deep copies instead of recreating the NTFS junction, for example.

@@ -19,7 +19,7 @@ static inline void finddata2dirent(struct dirent *ent, WIN32_FIND_DATAW *fdata)

/* Set file type, based on WIN32_FIND_DATA */
if ((fdata->dwFileAttributes & FILE_ATTRIBUTE_REPARSE_POINT)
&& fdata->dwReserved0 == IO_REPARSE_TAG_SYMLINK)
&& (fdata->dwReserved0 == IO_REPARSE_TAG_SYMLINK || fdata->dwReserved0 == IO_REPARSE_TAG_MOUNT_POINT))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite similar to https://github.com/git-for-windows/git/pull/437/files#diff-a6fe7e03ddc742e3276990c88327b92ce59fff378c1be07ed782f0096e131932L20-R21 therefore I suspect that the corresponding change to file_attr_to_st_mode() might be missing.

@jakebailey
Copy link
Author

I think the thing that gives me pause is that not treating them as symlinks doesn't match the behavior of WSL or git-bash observing the same filesystem (or just ls in powershell).

For example, pnpm install in Windows, followed by ls while in WSL in the same dir nets:

total 216K
drwxrwxrwx 1 jabaile jabaile 4.0K Apr 16 10:44  .bin/
-rwxrwxrwx 1 jabaile jabaile 215K Apr 16 10:45  .modules.yaml*
drwxrwxrwx 1 jabaile jabaile 4.0K Apr 16 10:45  .pnpm/
drwxrwxrwx 1 jabaile jabaile 4.0K Apr 16 10:44 '@definitelytyped'/
drwxrwxrwx 1 jabaile jabaile 4.0K Apr 16 10:44 '@eslint'/
drwxrwxrwx 1 jabaile jabaile 4.0K Apr 16 10:44 '@eslint-community'/
drwxrwxrwx 1 jabaile jabaile 4.0K Apr 16 10:44 '@octokit'/
drwxrwxrwx 1 jabaile jabaile 4.0K Apr 16 10:44 '@types'/
drwxrwxrwx 1 jabaile jabaile 4.0K Apr 16 10:44 '@typescript-eslint'/
lrwxrwxrwx 1 jabaile jabaile   92 Apr 16 10:44  comment-json -> '/mnt/e/work/DefinitelyTyped/node_modules/.pnpm/comment-json@4.2.3/node_modules/comment-json/'/
lrwxrwxrwx 1 jabaile jabaile   84 Apr 16 10:44  d3-array -> '/mnt/e/work/DefinitelyTyped/node_modules/.pnpm/d3-array@3.2.3/node_modules/d3-array/'/
lrwxrwxrwx 1 jabaile jabaile   82 Apr 16 10:44  d3-axis -> '/mnt/e/work/DefinitelyTyped/node_modules/.pnpm/d3-axis@3.0.0/node_modules/d3-axis/'/
lrwxrwxrwx 1 jabaile jabaile   84 Apr 16 10:44  d3-scale -> '/mnt/e/work/DefinitelyTyped/node_modules/.pnpm/d3-scale@4.0.2/node_modules/d3-scale/'/
lrwxrwxrwx 1 jabaile jabaile   92 Apr 16 10:44  d3-selection -> '/mnt/e/work/DefinitelyTyped/node_modules/.pnpm/d3-selection@3.0.0/node_modules/d3-selection/'/
lrwxrwxrwx 1 jabaile jabaile   82 Apr 16 10:44  d3-time -> '/mnt/e/work/DefinitelyTyped/node_modules/.pnpm/d3-time@3.1.0/node_modules/d3-time/'/
...

git then does the right thing on a clean, because it is seeing symlinks.

ls -lh from git-bash is the same:

total 4.0K
drwxr-xr-x 1 REDMOND+jabaile 4096   0 Apr 16 10:44 '@definitelytyped'/
drwxr-xr-x 1 REDMOND+jabaile 4096   0 Apr 16 10:44 '@eslint'/
drwxr-xr-x 1 REDMOND+jabaile 4096   0 Apr 16 10:44 '@eslint-community'/
drwxr-xr-x 1 REDMOND+jabaile 4096   0 Apr 16 10:44 '@octokit'/
drwxr-xr-x 1 REDMOND+jabaile 4096   0 Apr 16 10:44 '@types'/
drwxr-xr-x 1 REDMOND+jabaile 4096   0 Apr 16 10:44 '@typescript-eslint'/
lrwxrwxrwx 1 REDMOND+jabaile 4096  88 Apr 16 10:44  comment-json -> '/e/work/DefinitelyTyped/node_modules/.pnpm/comment-json@4.2.3/node_modules/comment-json/'/
lrwxrwxrwx 1 REDMOND+jabaile 4096  80 Apr 16 10:44  d3-array -> '/e/work/DefinitelyTyped/node_modules/.pnpm/d3-array@3.2.3/node_modules/d3-array/'/
lrwxrwxrwx 1 REDMOND+jabaile 4096  78 Apr 16 10:44  d3-axis -> '/e/work/DefinitelyTyped/node_modules/.pnpm/d3-axis@3.0.0/node_modules/d3-axis/'/
lrwxrwxrwx 1 REDMOND+jabaile 4096  80 Apr 16 10:44  d3-scale -> '/e/work/DefinitelyTyped/node_modules/.pnpm/d3-scale@4.0.2/node_modules/d3-scale/'/
lrwxrwxrwx 1 REDMOND+jabaile 4096  88 Apr 16 10:44  d3-selection -> '/e/work/DefinitelyTyped/node_modules/.pnpm/d3-selection@3.0.0/node_modules/d3-selection/'/
lrwxrwxrwx 1 REDMOND+jabaile 4096  78 Apr 16 10:44  d3-time -> '/e/work/DefinitelyTyped/node_modules/.pnpm/d3-time@3.1.0/node_modules/d3-time/'/

So, if git clean had been implemented from a shell script rather than in C, it too would work.

Even powershell's ls shows these as links, not directories:

    Directory: E:\work\DefinitelyTyped\node_modules

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----           4/16/2023 10:44 AM                .bin
d----           4/16/2023 10:45 AM                .pnpm
d----           4/16/2023 10:44 AM                @definitelytyped
d----           4/16/2023 10:44 AM                @eslint
d----           4/16/2023 10:44 AM                @eslint-community
d----           4/16/2023 10:44 AM                @octokit
d----           4/16/2023 10:44 AM                @types
d----           4/16/2023 10:44 AM                @typescript-eslint
l----           4/16/2023 10:44 AM                comment-json -> E:\work\DefinitelyTyped\node_modules\.pnpm\comment-json@4.2.3\node_modules\comment-json\
l----           4/16/2023 10:44 AM                d3-array -> E:\work\DefinitelyTyped\node_modules\.pnpm\d3-array@3.2.3\node_modules\d3-array\
l----           4/16/2023 10:44 AM                d3-axis -> E:\work\DefinitelyTyped\node_modules\.pnpm\d3-axis@3.0.0\node_modules\d3-axis\
l----           4/16/2023 10:44 AM                d3-scale -> E:\work\DefinitelyTyped\node_modules\.pnpm\d3-scale@4.0.2\node_modules\d3-scale\
l----           4/16/2023 10:44 AM                d3-selection -> E:\work\DefinitelyTyped\node_modules\.pnpm\d3-selection@3.0.0\node_modules\d3-selection\
l----           4/16/2023 10:44 AM                d3-time -> E:\work\DefinitelyTyped\node_modules\.pnpm\d3-time@3.1.0\node_modules\d3-time\
...

It's definitely true that these junctions behave weirdly in Explorer, but, I feel like following Explorer isn't the right thing for git. I don't know what operations would be harmed by changing the behavior to match WSL and git-bash, but it seems to me like there would be a benefit.

But, I'm absolutely not an expert; my goal here is to get some sort of resolution to unblock our transition of DefinitelyTyped to a monorepo, and git clean not working for our Windows users is really unfortunate.

Both pnpm and yarn use junctions by default, likely due to the history of Windows' symlinks previously needing elevated priviledges. However, modern versions of Windows (when in "developer mode") allow symlinks to be created by anyone, so another angle I can try is try and update the package managers to detect this and use "real" symlinks instead, but this seems a lot more challenging and more annoying to have to tell users to enable Developer Mode to get things working.

@jakebailey
Copy link
Author

Just some further data, but:

So for me, reading the above combined with WSL and git-bash's behavior, I feel like "junctions are symlinks" is the right move for maximum consistency.

The thread starting at #437 (comment) is pretty long, so forgive me if I've missed something, but it seems like it's the case that git itself won't make junctions anyhow, so I'm not totally sure what could go wrong by not traversing downward. Maybe the behavior if you attempt to check in one of these things is the problem?

@jakebailey
Copy link
Author

jakebailey commented Apr 16, 2023

Hm, I finally read to the bottom of that thread, and through multiple cross references found: #2268

So, it seems like it isn't intentional that git clean goes down junctions? Maybe figuring out what changed there is the "real" fix and we can forego the whole "are junctions symlinks" discussion...

But, I'm pretty sure that test is still there and passing, so, maybe it's a different thing entirely. Technically already, git clean succeeds, but not before it attempts to walk down the entire path until it hits the length limit and warns. That's the bad part.

But, I'm not sure that PR has an affect, becuase by the time the git clean code gets to things, finddata2dirent has already changed the type to DIR.

@dscho
Copy link
Member

dscho commented Apr 19, 2023

it seems like it isn't intentional that git clean goes down junctions?

Yep: #1976

As I said, I think we'll need a new Git attribute. That would make it possible to configure the behavior you desire, without re-breaking the bug reported in #607.

@jakebailey
Copy link
Author

As I said, I think we'll need a new Git attribute.

Truthfully I'm not sure how to implement that. The type is set way way down at the root of the thing that translates the Windows API into the more POSIX-y one git uses, so that to me means that there's a setting that affects the APIs needed to read the settings, i.e. in order to know if junctions are symlinks, we have to read the filesystem to pull the attribute config.

That would make it possible to configure the behavior you desire, without re-breaking the bug reported in #607.

I'm not sure I understand this reference; the linked issue seems to be asking for the same behavior I'm asking for, so I can't really re-break if it's already broken, more or less. Maybe I'm missing something in the thread; there's a commit mentioning that issue which has caused github to hide some 3000+ comments.

I still disagree in principle that junctions should be treated differently; I think that it's telling that if git-clean had been a shell script that it would behave the way I (and those in #607) expect because that's how git-bash is implemented. The only behavior to me that seems different is Explorer copying, otherwise, deletion, traversal, are all identical.

(In fact, I'm now moderately worried about what git rm might do, if it's treating these as directories and recursing. If it recurses and modifies what has been "symlinked", that's very, very different than a hard link.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Recusive Directory Junctions causes git clean -fd to return a warning
4 participants