New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve timestamps when unstreaming dirs #966
Conversation
I'm pretty happy with this in its current form. It does the job well enough for our purposes and tests are passing. |
@@ -85,6 +86,11 @@ def unstream_dir(stream, length, target_directory): | |||
|
|||
archive.extract(info.filename, path=target_directory) | |||
|
|||
# Fancy logic to preserve modification times |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a bit more context here to this comment as to why we need to preserve modifications time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this would be kind of a general expectation for extracting a zip file.
In the case of runner, we have a control & execution machine. The control machine creates the private_data_dir, the worker runs the job, and it gets streamed back. The modtimes can be used to distinguish if the control machine last modified the file, or if the file was modified during the job run.
In the AWX case, we use modification times to determine if new facts were written for a host. This is a big performance optimization for the case where the inventory has 10k hosts but you run against 1 host. Most playbooks will gather facts, and AWX maintains a fact cache in the database. A job run which should go super fast would get bogged down saving the rather substantial facts data structure for 10k hosts.
I have trouble putting that into a short sentence :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new Executor team (currently just myself and @eqrx) lack A LOT of context around design decisions in the current code base (much of it having to do with AWX where we have no experience or insight), so it makes maintaining it rather difficult. Just a simple sentence like "AWX uses modification times to determine if new facts were written for a host" would give us a lot of clarity when we see code like this where the purpose is not immediately clear.
test/unit/utils/test_utils.py
Outdated
|
||
# Assure modification times are same as original | ||
for filename in ('a.txt', 'b.txt'): | ||
assert os.path.getmtime(dest_dir / filename) == os.path.getmtime(dest_dir / filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will never fail. Don't you want to compare source_dir
to dest_dir
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, great catch, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Approved. Seems like something we should backport after merged. |
Preserve timestamps when unstreaming dirs Connect ansible#964 That fixes the linked issue with the limitation (discovered by the test) that we might miss a 2 second delta or less. This assumes the computer clock is perfectly synced on the machines that do the compressing and extracting. Nonetheless, I think this opens up an option for AWX to get the job done. I show a demo in the test with the very_old.txt file, where I artificially set the mod and access time to 1 day in the past. It seems like that would work for the AWX use case, assuming that the two computers agree on the day, which doesn't sound like an unreasonable demand. Ping @jbradberry Reviewed-by: Jeff Bradberry <None> Reviewed-by: David Shrewsbury <None> Reviewed-by: Alan Rominger <arominge@redhat.com> Reviewed-by: None <None> (cherry picked from commit 203ef43)
Connect #964
That fixes the linked issue with the limitation (discovered by the test) that we might miss a 2 second delta or less. This assumes the computer clock is perfectly synced on the machines that do the compressing and extracting.
Nonetheless, I think this opens up an option for AWX to get the job done. I show a demo in the test with the
very_old.txt
file, where I artificially set the mod and access time to 1 day in the past. It seems like that would work for the AWX use case, assuming that the two computers agree on the day, which doesn't sound like an unreasonable demand.Ping @jbradberry