Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new filtering fs: FilePredicateFs #309

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

satotake
Copy link

While golang's built-in regexp package has the advantage, constant time guarantees,
it (RE2) has some limitations to express complex rules compared to other languages (PCRE2).
(For example, it does not have backtracking.)
Here, I would like to introduce a new filtering backend, PredicateFs, to make afero's filtering more flexible and composable.

PredicateFs is a filtered view on predicates which takes file path as an argument.
Any file will be treated as non-existing when the predicate returns false.
Unlike RegexpFs, the fs targets not file name but file path.
Like, RegexpFs, files will not be created when the predicate returns false and directories are always not filtered.

A filtered view on predicates which takes file path as an argument,
any file will be treated as non-existing when the predicate returns false.
@CLAassistant
Copy link

CLAassistant commented Jun 19, 2021

CLA assistant check
All committers have signed the CLA.

@jxsl13
Copy link

jxsl13 commented Jan 9, 2022

This one is pretty nice as it may filter according to any kind of path related rule, not just a regular expression.

You removed a #from the readme of HttpFs

Imo it shouldn't handle directories in any special way.
Might as well provide access to the internal Fs in the predicate function in order to e.g. check if the file is a directory or do other checks with the file path. This might need some synchronozation mechanism, I'd guess or one assumes that the Fs is goroutine safe.
I wonder what the thoughts behind that reasoning were?
Same as RegexpFs?

I could think of a use case where I want my Fs to be only able to interact within specific directories somewhat like BasePathFs but with multiple directories e.g. provided in a regular expression or in a list.

@satotake
Copy link
Author

satotake commented Jan 10, 2022

You removed a #from the readme of HttpFs

This is not related with this request directly but I think HttpFs is not filtering fs.

I wonder what the thoughts behind that reasoning were?
Same as RegexpFs?

Good point.
Though I am not sure of the rationales of RegexpFs, probably they are the same.

Consider the situation where we

  • want to include only files with .go extension
  • use fs.WalkDir defined walk function, which calls Open internally

(yes, this will be minor)

If we apply the function with both files and dirs, strings.HasSuffix(s, ".go") will be wrong for files in sub dir.
dir/example.go is skipped surprisingly(?) because strings.HasSuffix("dir", ".go") is false.
This is easy to go wrong in my opinion.

Instead of it, as you describe, it will be clear to combine PredicateFs with BasepathFs etc.
Or, we can handle directories as I wrote the test though it is a little verbose (.hidden).

@jxsl13
Copy link

jxsl13 commented Jan 10, 2022

In case that you provide the underlying Fs as parameter to the predicate function, you might be able to check whether the file path is actually a file or a directory with a Stat call and then go for the file's suffix.

^
I think that this would make things way more complicated, for sure and potentially more error prone.

Might as well rename PredicateFs to FilePredicateFs in order to make it clearer that this abstraction is only targeting files.

@satotake
Copy link
Author

satotake commented Jan 10, 2022

@jxsl13 thanks for reviewing.
Replaced PredicateFs with FilePredicateFs

@satotake satotake changed the title Add new filtering fs: PredicateFs Add new filtering fs: FilePredicateFs Jan 10, 2022
@bep
Copy link
Collaborator

bep commented Jul 14, 2022

Sorry for being very late to the game, this looks very useful. I have read the discussion above, but I suspect that we're missing out on some potential when we apply this to files (and not dirs) only. If I walk a directory I certainly want to be able to skip a dir before reading it.

How about making the predicate:

func(isDir bool, path string) bool

@satotake
Copy link
Author

@bep thank you.
I referred to RegexpFs and it skips all directories internally.
So does this fs.

predicatefs.go Outdated
return nil, err
}
for _, i := range pfi {
if i.IsDir() || f.pred(filepath.Join(f.f.Name(), i.Name())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@satotake I'm not sure what you mean by "skip directories"; to me it looks like every directory is included.

I would have found it much more useful if the above was replaced with:

if f.pred(i.IsDir(), filepath.Join(f.f.Name(), i.Name())) {

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One might as well just pass the file info object instead of a bool which can then be manually checked whether the path is a directory or a file or a symlink, etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would break if you want to use this with fs.ReadDirFile (see my recent commit to the main branch).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bep I got your point.
isDir will be nice to ignore some dirs without scanning their children, for instance.
Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants