Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute hashes from all index servers #1556

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

stefansjs
Copy link

@stefansjs stefansjs commented Jan 25, 2022

Fixes #1536.

When pip-compile encounters one index server which supports the json API it does not go on to compute hashes on files from "simple" index servers. This change fetches hashes from all index servers, and computes hashes on all files that don't have one provided by a json API

Contributor checklist
  • Provided the tests for the changes.
  • Assure PR title is short, clear, and good to be included in the user-oriented changelog
Maintainer checklist
  • Assure one of these labels is present: backwards incompatible, feature, enhancement, deprecation, bug, dependency, docs or skip-changelog as they determine changelog listing.
  • Assign the PR to an existing or new milestone for the target version (following Semantic Versioning).

@stefansjs
Copy link
Author

This is a review for #1536

@stefansjs stefansjs marked this pull request as draft January 26, 2022 00:16
@stefansjs stefansjs marked this pull request as ready for review January 26, 2022 01:05
@michael-k
Copy link
Contributor

This would also fix #1135. But the current implementation won't work with index URLs ending in a slash, see #1669. aws codeartifact login generates such index URLs.

@stefansjs do you plan to continue with the PR? I'm happy to help getting it ready. :)

@stefansjs
Copy link
Author

Yes I would love to. I noticed that the latest master is a non trivial merge. I haven't looked deeply into what needs to change. I would really appreciate and help/guidance. It would be great to get this into master.

)
return package_links

def _get_project(self, ireq: InstallRequirement) -> Optional[Dict[str, Any]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyPIRepository._get_project is unused

Comment on lines +376 to +384
pypi_json = {
link.comes_from: self._get_json_from_index(link)
for link in all_package_links
}
pypi_hashes = {
url: self._get_hash_from_json(json_resp, ireq)
for url, json_resp in pypi_json.items()
if json_resp
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why dicts and not generators? This would only deduplicate something if someone used the same index multiple times. But even in this case _get_json_from_index (which takes the most time here due to network IO) would be called multiple times.

Comment on lines +437 to +442
return set(
itertools.chain.from_iterable(
self._get_hashes_for_link(candidate.link, pypi_hashes)
for candidate in matching_candidates
)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return set(
itertools.chain.from_iterable(
self._get_hashes_for_link(candidate.link, pypi_hashes)
for candidate in matching_candidates
)
)
return {
file_hash
for candidate in matching_candidates
for file_hash in self._get_hashes_for_link(candidate.link, pypi_hashes)
}

@michael-k
Copy link
Contributor

This would also fix #1135.

This was a false assumption after skimming over the code and noticing the package_index.simple_url. After a closer look I realize that it's only used to indicate where a link was coming from.

@atugushev
Copy link
Member

This PR might be superseded by #1723. I'm I right @neykov?

@neykov
Copy link
Contributor

neykov commented Nov 19, 2022

This PR might be superseded by #1723. I'm I right @neykov?

Looks like this PR goes one step further, fetching hashes from all available indexes, not just the primary one. It would still be useful to clean up and merge after #1723.

@jeffwidman
Copy link
Contributor

#1723 has been merged, @stefansjs do you want to rebase this per #1556 (comment)?

@chrysle chrysle added the needs rebase Need to rebase or merge label Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working needs rebase Need to rebase or merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pip-compile doesn't provide hashes for wheels hosted by simple index servers
6 participants