Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Incorrect Dates on some Gem::Specification #6859

Closed

Conversation

Daniel-N-Huss
Copy link

Hey there! 👋

Thanks for taking a bit of time to read this PR. I took a look at the code of conduct, and contribution guidelines and I believe everything here should be in line!

What was the end-user or developer problem that led to this PR?

Issue number #6815 regarding incorrect dates on GemSpecs, with recent Puma versions appearing to be built in 1980.
Another discussion happened on the Puma gem here.

A thread in the puma discussion showed that behind the scenes, using Nix as a system config / package manager sets a default SOURCE_DATE_EPOCH environment variable to encourage build stability. It's a noticeable, and unique value, for reasons related to other Nix dependencies.

In spirit of the proposal in #2290 for reproducible builds, and attending to backwards-compatible considerations from this comment in #3088 to avoid changing the SOURCE_DATE_EPOCH variable, I've put a barebones conditional check forward.

What is your fix for the problem, implemented in this PR?

This PR adds to the check for an existing SOURCE_DATE_EPOCH; if nothing is set, or it's an empty string, Gem will now explicitly ignore this specific default date.

I wanted to get something concrete to start a discussion on, but I think this exact implementation could be a problematic fix for gem versions that have been built with the Nix default. By explicitly ignoring those invalid dates, doing a rebuild of any gems based on that default date will result in a non-equivalent build.

It may be preferable to put a hard stop to any gems that defaulted to that date, in which case perhaps I didn't go far enough in this PR. If we would like to catch and remediate any of these mistaken dates perhaps we want to go as far as raising an error if a gem author tries to do a rebuild explicitly on that nix date.

Given that rubygems handles the build date automatically, it felt relevant to me that a fix for this type of build error happen at this layer. I think there could be some room to discuss moving from defaulting to the "current date", to check the git history for the date of the most recent revision on the branch as a reasonable default for builds. Thank you to @penguincoder for the discussion, and suggesting using git refs, which aligns with other OSS approaches to consistent builds with epoch times.

If the rubygems team is open to a git-based approach to SOURCE_DATE_EPOCH, I'd be pleased to open a standalone discussion and attempt an implementation. I left that approach out of this PR, as it felt like closing the gap felt like the pertinent first step, but I'm open to feedback! 😁

Make sure the following tasks are checked

  • Describe the problem / feature
  • Write tests for features and bug fixes

Noting here I had an error running the whole suite locally, with a failure on checking the manifest was up to date, but running the update manifest rake task didn't change anything 😄 Not sure if I made a mistake somewhere, but I can investigate a bit further if needed!

@welcome
Copy link

welcome bot commented Aug 4, 2023

Thanks for opening a pull request and helping make RubyGems and Bundler better! Someone from the RubyGems team will take a look at your pull request shortly and leave any feedback. Please make sure that your pull request has tests for any changes or added functionality.

We use GitHub Actions to test and make sure your change works functionally and uses acceptable conventions, you can review the current progress of GitHub Actions in the PR status window below.

If you have any questions or concerns that you wish to ask, feel free to leave a comment in this PR or join our #rubygems or #bundler channel on Slack.

For more information about contributing to the RubyGems project feel free to review our CONTRIBUTING guide

@penguincoder
Copy link

I'm kind of impressed that the eventual solution is so small (including a test!). Great work @Daniel-N-Huss!

@deivid-rodriguez
Copy link
Member

Thank you @Daniel-N-Huss for the investigation and patch!

I'm probably missing context but to me this seems like the wrong place to fix this problem. What if Nix eventually changes this magic number, or if another similar tool starts using a different magic number, do we need to keep adding special cases? I feel RubyGems is doing exactly what's intended here.

Maybe Nix should allow users to opt out of this default?

Anyways, I'll let @duckinator chime in since they coded this feature originally.

By the way, I'd be in favor of switching to using git-based source date epoch.

@Daniel-N-Huss
Copy link
Author

Thank you @deivid-rodriguez!

To your point:

I'm probably missing context but to me this seems like the wrong place to fix this problem. [...] I feel RubyGems is doing exactly what's intended here.

I'm of two minds here, and I think this a fair point.
It felt weird to write a conditional that's so specific-to-Nix inside RubyGems.

While digging around the gem-building process, I was surprised to see the lack of a date attribute in GemSpecification, or the reference guide.

If we're trusting the build process to perform some magic to automatically handle dates, then I think RubyGems might be on the hook to absorb some of the responsibility for working around different dates / system configurations. Perhaps a move to a git ref date makes this a non-issue!

Or, perhaps this is more of a documentation issue? Alerting gem authors that their gem build relies on system configured dates, or that source epoch? Apologies if I missed a source of clarity here 😁

@Daniel-N-Huss
Copy link
Author

As a fun aside, I was curious why the rubygems.org repo didn't have similar problems with reporting dates on gem versions.

This check, from the early days of migrating gems onto the service, ensures those mistaken build dates aren't trusted, and so the UI shows the more correct 'when the gem was uploaded' date.

@penguincoder
Copy link

When Nix builds software, it changes all dates and times to the epoch. The reasoning is so that if the content of two files are identical, then they will hash to identical values. The hashing process is sensitive to the timestamp on the file, so it sets it to a known value so that time does not factor into the hashing process. It also won't be possible to opt-out of this change, as this is a core pillar of the entire nix foundation. I would tend to doubt that the magic number will ever change. You can safely assume that if it does change that it will be a breaking change for the entire Nix ecosystem and there will be bigger issues to tackle.

I understand the desire to avoid hard-coding a date, because "where do you stop?". But it kind of makes sense to not trust the date/time on the file. Every time we run CI checks, it checks out a fresh copy of the code in the GHA runner, and each time the files have the timestamps of the CI run, not of the time-the-file-was-created-or-modified-in-git. You can really only trust the timestamps on the files if you are using the original source repository that exists on the developer's machine that created the file. Otherwise, the times will be arbitrary. This could be the epoch, the 1980-equivalent in nixpkgs, or the CI run start time... It won't convey the meaning you are wanting to extract from that value.

As an aside, you can test this out. check out any project with a Dockerfile. Have a friend do the same thing. Now coordinate with your friend, and go type docker build ... at as close to the exact same time as you can manage, and then see if you two get the same digest at the end.

Using the last-modified-time from the latest git commit will be the most reliable method to get the gem date. This also assumes the project uses git, but there are equivalents for other VCSs, too.

@deivid-rodriguez
Copy link
Member

That's an interesting find there @Daniel-N-Huss about why RubyGems.org didn't inherit this issue 👍. Also, it's good news that it doesn't rely on gems' build date, since that means if we switch to git-based build times, it won't show inaccurate/confusing publication dates.

In any case, I believe this feature is still half baked in RubyGems? You cannot yet reproduce a build easily (let alone make that the default behavior), but I think @duckinator's initial approach at #4913 was indeed to use the modification time of the package.

Anyways, I'd rather go with the git-based approach in the long term. After all, this particular issue is quite minor, right? It's just that if you deeply inspect the specification of some gems like puma, you'll end up seeing this date (which on the other hand, makes it easy to reproduce builds of puma :). As far as I understand, it has gone unnoticed for more than a year.

But let's wait to hear for @duckinator's input!

Thanks both again for your investigation, patch, and comments.

@duckinator
Copy link
Member

Thanks for digging into this. I'm not sure this is the right approach, mostly because Nix sets SOURCE_DATE_EPOCH to this value specifically to allow reproducible builds even if the end-user doesn't care at the time of making the initial build. And this patch would break that entirely.

My planned approach for gem rebuild in #4913 is to use the modification time of the package, but it's worth noting that the date being correct doesn't usually matter in practice — the more import part is being able to control the value, so build artifacts (in this case, .gem files) have the same hash when you rebuild them.

I don't think the date being wrong causes an actual problem in practice. To me, with my current understanding of the situation, it'd makes more sense to address this by having the system building Puma releases set SOURCE_DATE_EPOCH themselves, since Nix should never override it.


To be clear: I'm not opposed to making a change on our end if it makes sense. I'm just not sure it does make sense, and don't have the time to properly dig into this right now. If I don't follow up here by the 11th, feel free to @-mention me as a reminder.

Also: I suspect having RubyGems default to using information from git would cause more problems than it solves, since we already have gemspecs that rely on git causing people problems by adding git as a build dependency for most gems.

@duckinator
Copy link
Member

duckinator commented Aug 9, 2023

I've been thinking about this for the last few days. I have a few thoughts I want to note real quick.


I think there's 2 things that definitely make sense on the RubyGems side of things:

  1. Add a comment to rubygems.org's rely_on_built_at? noting that it is also useful for catching obviously-nonsense build dates (e.g. any date before 2003 is obviously wrong because RubyGems didn't exist). This existing code prevents the bogus dates turning into confusion for people casually browsing RubyGems.org, and it makes sense to me for this to officially become part of its job.
  2. Everything around gems and reproducible builds needs documentation.

Regarding Puma and other packages that want reproducible builds and correct dates:

While I dislike having RubyGems determine the date using git automatically, setting SOURCE_DATE_EPOCH using the date of the latest Git revision included is a reasonable method.

It might make sense to add a flag for gem build (and maybe the tasks added by bundler/gem_tasks?) to opt-in to using git for this.

@@ -140,6 +140,20 @@ def test_build_time_without_source_date_epoch
ENV["SOURCE_DATE_EPOCH"] = epoch
end

def test_build_time_ignores_invalid_source_date_epoch
ENV["SOURCE_DATE_EPOCH"] = "315532800"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This environment variable should be restored after this test, like the other related tests.

@Daniel-N-Huss
Copy link
Author

Daniel-N-Huss commented Aug 16, 2023

I appreciate the feedback and discussion here! Thanks for taking the time 😄

  1. Add a comment to [rubygems.org's rely_on_built_at?]

I popped up a brief PR on that side to add this clarification!

  1. Everything around gems and reproducible builds needs documentation.

I'd agree, and feel this is a better place to start than the magic string implementation I originally put up here.
I'm happy to start putting a bit of detail together on what exists now.

Adding a note to the specification reference, in the same way that rubygems_version is noted is the most obvious place I'll start with. Do you imagine other places you'd want to share details about the feature?

It might make sense to add a flag for gem build (and maybe the tasks added by bundler/gem_tasks?) to opt-in to using git for this.

To your earlier point about avoiding a hard dependency on git, I noticed the bundler gem introduces a soft 'dependency' on git to resolve the git_commit_sha for its build metadata. Does this type of soft dependency seem reasonable for rubygems, if it's also put behind an opt-in flag, and gracefully returns to the current behaviour?

If so, it sounds like the two pieces of this would include:

  • a command, such as gem build --reproducible, that looks to set SOURCE_DATE_EPOCH from the most recent revision date, and gracefully falls back to the existing behaviour if it cannot.
  • extending the command reference for gem build to indicate that flag's behaviour, highlighting a set SOURCE_DATE_EPOCH in the build can also work.

I've done a cursory look at an implementation of a flag for gem build, and am happy to dedicate some more time to that type of approach over the next week or so!

@deivid-rodriguez
Copy link
Member

@duckinator Did adding the gem rebuild command change anything here? Should this PR be closed?

@duckinator
Copy link
Member

Nix approached SOURCE_DATE_EPOCH as: if it's unset, setting it to a fixed value.

RubyGems approached it as:

  1. if it's a rebuild, try to guess the original value and use that.
  2. if it's unset AND not a rebuild, set it to the current timestamp.

This means when Nix is used to build a Gem, it's set to Nix's fixed value in an environment where RubyGems' vaguely-"real" value is expected.

I don't think it makes sense to fix this in RubyGems. If you want RubyGems to set the value, you can just unset SOURCE_DATE_EPOCH.

@duckinator duckinator closed this Apr 29, 2024
@duckinator
Copy link
Member

I also feel like this would just fundamentally break reproducible gem builds in Nix? Because those rebuilds use Nix infrastructure, not RubyGems infrastructure, so they'd be expecting it to always have that value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants