Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix issue #6377 #6379

Closed
wants to merge 1 commit into from
Closed

Conversation

Ren-hongchen
Copy link

fix #6377
The reason why the file is not overwritten is that the test file in the TMP directory is not cleared after the test is run for the first time, and the directory and filename of the temporary file are the same after each test run.

This code skips overwriting when it finds that the file already exists:

if not os.path.exists(extracted_path): in utils.extract_zipped_paths()

This means that calling this function twice in a row will not be able to detect changes in compressed files?

So i thought that judgment might be redundant, so I removed it.

@sigmavirus24
Copy link
Contributor

This will dramatically kill performance for people using certifi and requests from zip files. We won't be accepting it

@Ren-hongchen Ren-hongchen deleted the extract_zipped_paths branch March 18, 2023 12:53
@Ren-hongchen
Copy link
Author

Ren-hongchen commented Mar 24, 2023

This will dramatically kill performance for people using certifi and requests from zip files. We won't be accepting it

Sorry, I'm a noob. @sigmavirus24
Is it acceptable to use MD5 to compare existing files in a temporary directory with the compressed file to detect if the contents of the file have changed?

import hashlib

def extract_zipped_paths():
    ...
    if os.path.exists(extracted_path):
        with open(extracted_path, "rb") as file_handler:
            file_content = file_handler.read()
            file_hash = hashlib.md5(file_content).hexdigest()
        zip_file_hash = hashlib.md5(zip_file.read(member)).hexdigest()
        
        if file_hash == zip_file_hash:
            return extracted_path
        
    # use read + write to avoid the creating nested folders, we only want the file, avoids mkdir racing condition
    with atomic_open(extracted_path) as file_handler:
        file_handler.write(zip_file.read(member))

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 26, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_zipped_paths_extracted fails if test file has been modified since last run
2 participants