Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add caching for YAML.safe_load_file #375

Closed
jesse-shopify opened this issue Sep 28, 2021 · 7 comments
Closed

Add caching for YAML.safe_load_file #375

jesse-shopify opened this issue Sep 28, 2021 · 7 comments

Comments

@jesse-shopify
Copy link

I made sure the issue is in bootsnap

I learned today that files loaded by YAML.safe_load_file are not cached. Searching this repository suggests this is the case with YAML.load_file and YAML.unsafe_load_file being supported but not YAML.safe_load_file.

Steps to reproduce

The application where this is relevant is using the Faker gem, thus stuck at Psych 3.3. To use safe_load_file in Psych 3.3 it has to be called explicitly (in Psych 4.0 it is the default for load_file).

Expected behavior

Files loaded with YAML.safe_load_file should be cached (at least for simple use cases).

Actual behavior

Files loaded with YAML.safe_load_file are not cached.

System configuration

Bootsnap version: 1.9.1

Ruby version: 2.6.3, 2.7.1

Rails version: 6.1.4.1

@casperisfine
Copy link
Contributor

It wasn't done because to be efficient, the caching must be doable statically through the precompile command, so at this stage we have no idea what kind of arguments will be used.

So at best we could support safe loading only when a strict subset of arguments are used.

@casperisfine
Copy link
Contributor

Arf, I remember now why I didn't do it.

For options such as alias: false, that information is lost once the cache is compiled, so for instance:

YAML.safe_load_file("foo.yml") # raises because the file has aliases or other unsafe constructs
YAML.unsafe_load_file("foo.yml") # works fine and the cache is stored
YAML.safe_load_file("foo.yml") # would succeed because it would load the cache directly.

I really don't see any way around this.

@jesse-shopify
Copy link
Author

Isn't this already an issue with Psych 4 since load_file calls load which calls safe_load?

If I'm reading this right, the only possible supported options are symbolize_keys and ```freeze``, any other option will not hit the cache (also good to know):

If that's the case then passing alias will always avoid the cache. An implementation of safe_load_file identical to load_file and unsafe_load_file seems like it would be sufficient for the common use case.

That would mean load_file and safe_load_file would be replaced with unsafe_load, but that's already the case (when using Psych 4), and presumably, acceptable because content is local and trusted.

The only issue is if any code parses an untrusted, user controlled yml file using safe_load_file with the expectation it will be safely parsed. But that's already an issue with Psych 4 using load_file.

I don't have the answer and recognize this is tricky.

@casperisfine
Copy link
Contributor

Isn't this already an issue with Psych 4 since load_file calls load which calls safe_load?

No because load_file isn't cached when we detect such case:

if Patch.method_defined?(:load_file) && ::YAML::VERSION >= '4'
Patch.send(:remove_method, :load_file)
end

If that's the case then passing alias will always avoid the cache.

The problem isn't when aliases: true is passed, but when it's not passed.

@jesse-shopify
Copy link
Author

jesse-shopify commented Oct 1, 2021

Ah, neat. So is this an accurate summary?

  • psych 3
    • cached (conditionally): load_file unsafe_load_file
    • uncached: safe_load_file
  • psych 4
    • cached (conditionally): unsafe_load_file
    • uncached: load_file safe_load_file

@casperisfine
Copy link
Contributor

It is yes. One thing though unsafe_load_file was added in 3.3.something to ease the migration. SO Some Psych 3 might not have it. Hence why the best test is YAML.respond_to?(:unsafe_load_file)

@casperisfine
Copy link
Contributor

So #392 fixes most of this. It still doesn't cache YAML.safe_load_file as I don't think that method makes much sense (if you use safe_load_file you basically don't trust that file, so it has no place it bootsnap cache), however it does cache YAML.load_file on Psych 4, which is in common use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants