Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace fast-glob with custom fs.walk #1210

Merged
merged 3 commits into from
Apr 19, 2024
Merged

Conversation

Blugatroff
Copy link
Contributor

Addresses #1182

Description of the change

This implementation is using micromatch (a dependency of fast-glob) and fs.walk to traverse the file system, instead of relying on fast-glob. As opposed to the previous implementation (from #1209) This implementation takes every .gitignore file into account, not just the root one. The callbacks entryFilter and deepFilter are used to control which directories to recurse into. When entryFilter encounters a .gitignore file, it's patterns are parsed into micromatch compatible ones and every subsequent call to these filter functions respects them.

The previously used Glob.match' also returned two arrays one called failed and one called succeeded. I'm not sure what it means for a path to be in the failed array and this implementation does not have any equivalent. Because of that, this warning will no longer be emitted by spago

I'm not sure whether this should have been (partially) implemented in the registry-dev repo since that's where the FFI for fast-glob is located.

I also did not remove the fast-glob npm dependency, since i'm not sure if it is still in use elsewhere and because the registry-dev dependency definitely still contains the foreign functions relying on it being available.

Before i go any further and perhaps add tests, i wan't to get some feedback as to whether this is the right approach and place :)

Checklist:

  • Added the change to the "Unreleased" section of the changelog
  • Added some example of the new feature to the README
  • Added a test for the contribution (if applicable)

Blugatroff and others added 3 commits April 6, 2024 00:09
Addresses purescript#1182

This implementation is using `micromatch` (a dependency of fast-glob)
and `fs.walk` to traverse the file system itself, instead of relying on fast-glob.
As opposed to the previous implementation (from purescript#1209) This implementation takes
every .gitignore file into account, not just the root one.
The callbacks `entryFilter` and `deepFilter` are used to control which
directories to recurse into. When `entryFilter` encounters a .gitignore
file, it's patterns are parsed into micromatch compatible ones and every
subsequent call to these filter functions respect them.

I'm not sure whether this should have been (partially) implemented in the
registry-dev repo since that's where the FFI for fast-glob is located.

I also did not remove the fast-glob npm dependency, since i'm not sure if
it is still in use elsewhere and because the registry-dev dependency
definitely still contains the foreign functions relying on it being available.
@f-f
Copy link
Member

f-f commented Apr 19, 2024

Wow - on my (very fast) machine the mainline Spago spends ~150ms crawling the filesystem for files, but with this patch that time is only ~10ms. That's a 1500% improvement I think?
Great work ❤️

@f-f
Copy link
Member

f-f commented Apr 19, 2024

@Blugatroff the patch looks good! A few general notes:

The previously used Glob.match' also returned two arrays one called failed and one called succeeded. I'm not sure what it means for a path to be in the failed array and this implementation does not have any equivalent. Because of that, this warning will no longer be emitted by spago

The registry needs to check that paths are "sanitised", i.e. if they are self-contained in a directory, to make sure that if it's looking at some source contained in /some/package/path, then there can't be any symlinks "escaping" from there and pointing at some other thing e.g. in /some, which it shouldn't be allowed to access.

Spago might not have the same requirements, since things are definitely more local - while the registry has tighter security requirements, having to accept basically untrusted inputs.
I'd add we recently encountered an issue with this sanitisation with our build at work, where symlinks would not allow us to build even if things were in the right place (cc @srstrong).
So I'd say we can roll with this for now and see where it gets us (and remove that now-unused warning too)

I'm not sure whether this should have been (partially) implemented in the registry-dev repo since that's where the FFI for fast-glob is located.

As noted above, the requirements in spago might be different than the ones in the registry repo, so I'd say it's fine to have two separate implementations.

I also did not remove the fast-glob npm dependency, since i'm not sure if it is still in use elsewhere and because the registry-dev dependency definitely still contains the foreign functions relying on it being available.

The fast-glob npm dependency and the registry-foreign PureScript dependency are definitely good to be removed, since this globbing code was the only thing we were using from there.

@f-f
Copy link
Member

f-f commented Apr 19, 2024

Oh, and no worries about adding new tests - as long as the performance looks better and we don't break the expectations of the existing test suite then this is good to go

Copy link
Member

@f-f f-f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I'll go ahead and merge this - we can remove the unused packages in a followup patch. Thanks!

@f-f f-f merged commit 56a36d5 into purescript:master Apr 19, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants