Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve timestamps when unstreaming dirs #966

Merged
merged 3 commits into from Jan 27, 2022

Conversation

AlanCoding
Copy link
Member

Connect #964

That fixes the linked issue with the limitation (discovered by the test) that we might miss a 2 second delta or less. This assumes the computer clock is perfectly synced on the machines that do the compressing and extracting.

Nonetheless, I think this opens up an option for AWX to get the job done. I show a demo in the test with the very_old.txt file, where I artificially set the mod and access time to 1 day in the past. It seems like that would work for the AWX use case, assuming that the two computers agree on the day, which doesn't sound like an unreasonable demand.

Ping @jbradberry

@AlanCoding AlanCoding requested a review from a team as a code owner January 19, 2022 21:06
@AlanCoding
Copy link
Member Author

I'm pretty happy with this in its current form. It does the job well enough for our purposes and tests are passing.

@Shrews Shrews added the needs_triage New item that needs to be triaged label Jan 25, 2022
@@ -85,6 +86,11 @@ def unstream_dir(stream, length, target_directory):

archive.extract(info.filename, path=target_directory)

# Fancy logic to preserve modification times
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a bit more context here to this comment as to why we need to preserve modifications time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this would be kind of a general expectation for extracting a zip file.

In the case of runner, we have a control & execution machine. The control machine creates the private_data_dir, the worker runs the job, and it gets streamed back. The modtimes can be used to distinguish if the control machine last modified the file, or if the file was modified during the job run.

In the AWX case, we use modification times to determine if new facts were written for a host. This is a big performance optimization for the case where the inventory has 10k hosts but you run against 1 host. Most playbooks will gather facts, and AWX maintains a fact cache in the database. A job run which should go super fast would get bogged down saving the rather substantial facts data structure for 10k hosts.

I have trouble putting that into a short sentence :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new Executor team (currently just myself and @eqrx) lack A LOT of context around design decisions in the current code base (much of it having to do with AWX where we have no experience or insight), so it makes maintaining it rather difficult. Just a simple sentence like "AWX uses modification times to determine if new facts were written for a host" would give us a lot of clarity when we see code like this where the purpose is not immediately clear.

@Shrews Shrews removed the needs_triage New item that needs to be triaged label Jan 25, 2022

# Assure modification times are same as original
for filename in ('a.txt', 'b.txt'):
assert os.path.getmtime(dest_dir / filename) == os.path.getmtime(dest_dir / filename)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will never fail. Don't you want to compare source_dir to dest_dir?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, great catch, thanks

@Shrews Shrews added the gate label Jan 27, 2022
Copy link
Contributor

@ansible-zuul ansible-zuul bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Shrews
Copy link
Contributor

Shrews commented Jan 27, 2022

Approved. Seems like something we should backport after merged.

@ansible-zuul ansible-zuul bot merged commit 203ef43 into ansible:devel Jan 27, 2022
Shrews pushed a commit to Shrews/ansible-runner that referenced this pull request May 4, 2022
Preserve timestamps when unstreaming dirs

Connect ansible#964
That fixes the linked issue with the limitation (discovered by the test) that we might miss a 2 second delta or less. This assumes the computer clock is perfectly synced on the machines that do the compressing and extracting.
Nonetheless, I think this opens up an option for AWX to get the job done. I show a demo in the test with the very_old.txt file, where I artificially set the mod and access time to 1 day in the past. It seems like that would work for the AWX use case, assuming that the two computers agree on the day, which doesn't sound like an unreasonable demand.
Ping @jbradberry

Reviewed-by: Jeff Bradberry <None>
Reviewed-by: David Shrewsbury <None>
Reviewed-by: Alan Rominger <arominge@redhat.com>
Reviewed-by: None <None>
(cherry picked from commit 203ef43)
Shrews added a commit that referenced this pull request May 4, 2022
Backport of PR #966 

(cherry picked from commit 203ef43)

Co-authored-by: Alan Rominger <arominge@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants