Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure symlinked realpaths are not loaded twice #3189

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rwstauner
Copy link
Collaborator

closes #3138

Use a hash to cache realpath lookups
and another to cache realpath -> feature.

This is based off of what I was reading in the CRuby code:
https://github.com/ruby/ruby/blob/499eb3990faeaac2603787f2a41b2d9625e180dc/load.c#L358-L402
https://github.com/ruby/ruby/blob/499eb3990faeaac2603787f2a41b2d9625e180dc/load.c#L1195-L1289
but I could easily be wrong about something here.

Let me know if you see problems with the approach or a better way to do something here.

I did have a few questions remaining:
Should the body of the load_unless_realpath_loaded method be synchronized? Does it make sense to wrap it in with_synchronized_features?

There was a commit in CRuby:
ruby/ruby@972f274#:~:text=To%20avoid%20concurrency%20issues%20when%20rebuilding%20the%20loaded%20features%0Aindex%2C%20the%20building%20of%20the%20index%20itself%20is%20left%20alone%2C%20and%0Aafterwards%2C%20a%20separate%20loop%20is%20done%20on%20a%20copy%20of%20the%20loaded%20feature%0Asnapshot%20in%20order%20to%20rebuild%20the%20realpaths%20hash.

To avoid concurrency issues when rebuilding the loaded features
index, the building of the index itself is left alone, and
afterwards, a separate loop is done on a copy of the loaded feature
snapshot in order to rebuild the realpaths hash.

Do we not need to worry about that here because it's already being called inside with_synchronized_features? Or would that still be a concern?

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jul 27, 2023
@rwstauner rwstauner force-pushed the rwstauner/loaded-features-realpaths branch from 1df367c to 89e47ef Compare July 27, 2023 20:03
@eregon
Copy link
Member

eregon commented Jul 28, 2023

Design-wise that change looks pretty hacky to me.
The simplest for TruffleRuby would be to realpath when adding to $LOADED_FEATURES.
But that may not be compatible enough, unsure.

@@ -140,6 +143,16 @@ def self.find_feature_or_file(feature, use_feature_provided = true)
end
end

def self.load_unless_realpath_loaded(feature, path)
# TODO: does this need to be synchronized?
realpath = (@features_realpath_cache[path] ||= File.realpath(path) || path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK File.realpath never returns false|nil, always a String, or an exception if some part does not exist.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That did seem to be the case for me but I wasn't sure if it was true on all platforms.
I think I was looking at this C code and it made me want to program defensively (the semantics are probably different, though).
https://github.com/ruby/ruby/blob/7ce4b716bdb5bcfc8b30ffcd034ce7aded1f72b9/load.c#L87-L88

@@ -140,6 +143,16 @@ def self.find_feature_or_file(feature, use_feature_provided = true)
end
end

def self.load_unless_realpath_loaded(feature, path)
# TODO: does this need to be synchronized?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes of course, you could use a TruffleRuby::ConcurrentMap.
But I think this code would need to move to Java anyway, because we cannot have an extra backtrace entry for require, everything must happen in Primitive.load_feature (or another Primitive reusing it).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was wondering if it should be under with_synchronized_features.

That's a good point, though, might be able to do the whole change in java 🤔

@@ -35,6 +35,9 @@ module FeatureLoader
# A snapshot of $LOADED_FEATURES, to check if the @loaded_features_index cache is up to date.
@loaded_features_version = -1

@features_realpath_cache = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this cache is correct, for instance it seems broken if CWD changes or $LOAD_PATH changes.
Probably that part shouldn't be cached.

You could cache absolute_path->realpath though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, we probably want absolute path as the key.

@@ -292,13 +305,24 @@ def self.get_loaded_features_index
unless @loaded_features_version == $LOADED_FEATURES.version
raise '$LOADED_FEATURES is frozen; cannot append feature' if $LOADED_FEATURES.frozen?
@loaded_features_index.clear
previous_realpaths = @features_realpath_cache.dup
@features_realpath_cache.clear
@loaded_features_realpaths.clear
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this cache never needs to be cleared.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eregon
Copy link
Member

eregon commented Jul 28, 2023

If #3138 is not really an issue in practice I would probably delay this.
It seems pretty messy to do correctly.
I'd rather we convince CRuby to realpath all LOADED_FEATURES eagerly or do that and accept the small semantics difference (it's much simpler and likely more efficient).

@rwstauner
Copy link
Collaborator Author

The simplest for TruffleRuby would be to realpath when adding to $LOADED_FEATURES.
But that may not be compatible enough, unsure.

In my testing it seemed that the first time the realpath was stored it was saved to LOADED_FEATURES as whatever was actually requested.
Then if that realpath was requested again, it would be ignored.

I tried to follow what the ruby C code was doing, but it's probably more complicated than what I have so far here.

I'd rather we convince CRuby to realpath all LOADED_FEATURES eagerly or do that and accept the small semantics difference (it's much simpler and likely more efficient).

Might be worth a discussion 👍

@eregon
Copy link
Member

eregon commented Aug 16, 2023

Action item for me here is I need to file a CRuby issue to suggest using realpath for everything, it would be so much simpler and consistent, and have that discussed in a dev meeting.
In parallel we could try to implement the semantics of "everything realpath" in TruffleRuby and see if that's compatible enough.
In either case I think this is too risky to integrate for the 23.1 release.

Also would be useful to get an answer to #3138 (comment), cc @nirvdrum

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCA Verified All contributors have signed the Oracle Contributor Agreement. shopify
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Double loading symlinked files
2 participants