Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support files, not just directories, as single page sources #6885

Open
4 tasks done
yukw777 opened this issue Oct 25, 2022 · 15 comments · May be fixed by #9166
Open
4 tasks done

Support files, not just directories, as single page sources #6885

yukw777 opened this issue Oct 25, 2022 · 15 comments · May be fixed by #9166
Labels
kind/bug Something isn't working as expected status/triage This issue needs to be triaged

Comments

@yukw777
Copy link

yukw777 commented Oct 25, 2022

  • I am on the latest stable Poetry version, installed using a recommended method.
  • I have searched the issues of this repo and believe that this is not a duplicate.
  • I have consulted the FAQ and blog for any relevant entries or release notes.
  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option) and have included the output below.
  ValueError

  Package('mmcv-full', '1.6.2') is not in list

  at ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/repositories/legacy_repository.py:51 in package
       47│         Note that this will be cached so the subsequent operations
       48│         should be much faster.
       49│         """
       50│         try:
    →  51│             index = self._packages.index(Package(name, version))
       52│
       53│             return self._packages[index]
       54│         except ValueError:
       55│             package = super().package(name, version, extras)

The following error occurred when trying to handle this error:


  Stack trace:

  29  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/cleo/application.py:329 in run
       327│
       328│             try:
     → 329│                 exit_code = self._run(io)
       330│             except Exception as e:
       331│                 if not self._catch_exceptions:

  28  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/console/application.py:185 in _run
       183│         self._load_plugins(io)
       184│
     → 185│         exit_code: int = super()._run(io)
       186│         return exit_code
       187│

  27  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/cleo/application.py:423 in _run
       421│             io.input.set_stream(stream)
       422│
     → 423│         exit_code = self._run_command(command, io)
       424│         self._running_command = None
       425│

  26  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/cleo/application.py:465 in _run_command
       463│
       464│         if error is not None:
     → 465│             raise error
       466│
       467│         return event.exit_code

  25  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/cleo/application.py:449 in _run_command
       447│
       448│             if event.command_should_run():
     → 449│                 exit_code = command.run(io)
       450│             else:
       451│                 exit_code = ConsoleCommandEvent.RETURN_CODE_DISABLED

  24  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/cleo/commands/base_command.py:119 in run
       117│         io.input.validate()
       118│
     → 119│         status_code = self.execute(io)
       120│
       121│         if status_code is None:

  23  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/cleo/commands/command.py:83 in execute
        81│
        82│         try:
     →  83│             return self.handle()
        84│         except KeyboardInterrupt:
        85│             return 1

  22  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/console/commands/add.py:255 in handle
       253│         self.installer.whitelist([r["name"] for r in requirements])
       254│
     → 255│         status = self.installer.run()
       256│
       257│         if status == 0 and not self.option("dry-run"):

  21  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/installation/installer.py:115 in run
       113│             self._execute_operations = False
       114│
     → 115│         return self._do_install()
       116│
       117│     def dry_run(self, dry_run: bool = True) -> Installer:

  20  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/installation/installer.py:248 in _do_install
       246│                 source_root=self._env.path.joinpath("src")
       247│             ):
     → 248│                 ops = solver.solve(use_latest=self._whitelist).calculate_operations()
       249│         else:
       250│             self._io.write_line("Installing dependencies from lock file")

  19  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/puzzle/solver.py:73 in solve
        71│         with self._provider.progress():
        72│             start = time.time()
     →  73│             packages, depths = self._solve(use_latest=use_latest)
        74│             end = time.time()
        75│

  18  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/puzzle/solver.py:151 in _solve
       149│
       150│         try:
     → 151│             result = resolve_version(
       152│                 self._package, self._provider, locked=locked, use_latest=use_latest
       153│             )

  17  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/mixology/__init__.py:24 in resolve_version
        22│     solver = VersionSolver(root, provider, locked=locked, use_latest=use_latest)
        23│
     →  24│     return solver.solve()
        25│

  16  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/mixology/version_solver.py:127 in solve
       125│             while next is not None:
       126│                 self._propagate(next)
     → 127│                 next = self._choose_package_version()
       128│
       129│             return self._result()

  15  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/mixology/version_solver.py:446 in _choose_package_version
       444│             package = locked
       445│
     → 446│         package = self._provider.complete_package(package)
       447│
       448│         conflict = False

  14  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/puzzle/provider.py:529 in complete_package
       527│                 dependency_package = DependencyPackage(
       528│                     dependency,
     → 529│                     self._pool.package(
       530│                         package.pretty_name,
       531│                         package.version,

  13  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/repositories/pool.py:152 in package
       150│
       151│         if repository is not None and not self._ignore_repository_names:
     → 152│             return self.repository(repository).package(name, version, extras=extras)
       153│
       154│         for repo in self._repositories:

  12  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/repositories/legacy_repository.py:55 in package
        53│             return self._packages[index]
        54│         except ValueError:
     →  55│             package = super().package(name, version, extras)
        56│             package._source_type = "legacy"
        57│             package._source_url = self._url

  11  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/repositories/cached.py:86 in package
        84│         extras: list[str] | None = None,
        85│     ) -> Package:
     →  86│         return self.get_release_info(canonicalize_name(name), version).to_package(
        87│             name=name, extras=extras
        88│         )

  10  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/repositories/cached.py:63 in get_release_info
        61│             return PackageInfo.load(self._get_release_info(name, version))
        62│
     →  63│         cached = self._cache.remember_forever(
        64│             f"{name}:{version}", lambda: self._get_release_info(name, version)
        65│         )

   9  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/cachy/repository.py:174 in remember_forever
       172│             return val
       173│
     → 174│         val = value(callback)
       175│
       176│         self.forever(key, val)

   8  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/cachy/helpers.py:6 in value
         4│ def value(val):
         5│     if callable(val):
     →   6│         return val()
         7│
         8│     return val

   7  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/repositories/cached.py:64 in <lambda>
        62│
        63│         cached = self._cache.remember_forever(
     →  64│             f"{name}:{version}", lambda: self._get_release_info(name, version)
        65│         )
        66│

   6  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/repositories/legacy_repository.py:121 in _get_release_info
       119│         yanked = page.yanked(name, version)
       120│
     → 121│         return self._links_to_data(
       122│             links,
       123│             PackageInfo(

   5  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/repositories/http.py:232 in _links_to_data
       230│                 with temporary_directory() as temp_dir:
       231│                     filepath = Path(temp_dir) / link.filename
     → 232│                     self._download(link.url, filepath)
       233│
       234│                     known_hash = (

   4  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/repositories/http.py:72 in _download
        70│
        71│     def _download(self, url: str, dest: Path) -> None:
     →  72│         return download_file(url, dest, session=self.session)
        73│
        74│     def _get_info_from_wheel(self, url: str) -> PackageInfo:

   3  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/utils/helpers.py:84 in download_file
        82│     get = requests.get if not session else session.get
        83│
     →  84│     response = get(url, stream=True, timeout=REQUESTS_TIMEOUT)
        85│     response.raise_for_status()
        86│

   2  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/utils/authenticator.py:247 in get
       245│
       246│     def get(self, url: str, **kwargs: Any) -> requests.Response:
     → 247│         return self.request("get", url, **kwargs)
       248│
       249│     def post(self, url: str, **kwargs: Any) -> requests.Response:

   1  ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/poetry/utils/authenticator.py:233 in request
       231│                 if resp.status_code not in [502, 503, 504] or is_last_attempt:
       232│                     if raise_for_status:
     → 233│                         resp.raise_for_status()
       234│                     return resp
       235│

  HTTPError

  404 Client Error: Not Found for url: https://download.openmmlab.com/mmcv/dist/cu113/torch1.12/torch1.12.0/mmcv_full-1.6.2-cp310-cp310-manylinux1_x86_64.whl

  at ~/.local/share/pypoetry/venv/lib/python3.8/site-packages/requests/models.py:1021 in raise_for_status
      1017│                 f"{self.status_code} Server Error: {reason} for url: {self.url}"
      1018│             )
      1019│
      1020│         if http_error_msg:
    → 1021│             raise HTTPError(http_error_msg, response=self)
      1022│
      1023│     def close(self):
      1024│         """Releases the connection back to the pool. Once this method has been
      1025│         called the underlying ``raw`` object must not be accessed again.

Issue

I'm trying to install mmcv-full. They provide wheels via a single page link source, which I have added as a source in my pyproject.toml. Unfortunately, this page has relative paths with double dots, and poetry fails to resolve the correct wheel URLs. I believe the issue lies here:

if not url.endswith("/"):
url += "/"

Once a trailing slash is added, (https://download.openmmlab.com/mmcv/dist/cu113/torch1.12/index.html => https://download.openmmlab.com/mmcv/dist/cu113/torch1.12/index.html/), urllib.parse.urljoin() fails to properly join the urls here:

url = self.clean_link(urllib.parse.urljoin(self._url, href))

>>> import urllib.parse
# Wrong
>>> urllib.parse.urljoin("https://download.openmmlab.com/mmcv/dist/cu113/torch1.12/index.html/", "../torch1.12.0/mmcv_full-1.6.2-cp39-cp39-manylinux1_x86_64.whl")
'https://download.openmmlab.com/mmcv/dist/cu113/torch1.12/torch1.12.0/mmcv_full-1.6.2-cp39-cp39-manylinux1_x86_64.whl'
# Correct
>>> urllib.parse.urljoin("https://download.openmmlab.com/mmcv/dist/cu113/torch1.12/index.html", "../torch1.12.0/mmcv_full-1.6.2-cp39-cp39-manylinux1_x86_64.whl")
'https://download.openmmlab.com/mmcv/dist/cu113/torch1.12.0/mmcv_full-1.6.2-cp39-cp39-manylinux1_x86_64.whl'

Basically, we shouldn't always add trailing slashes to URLs.

@yukw777 yukw777 added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Oct 25, 2022
@neersighted
Copy link
Member

I think we could have better logic here, but wouldn't dropping index.html from your URL also solve it?

@yukw777
Copy link
Author

yukw777 commented Oct 25, 2022

@neersighted Unfortunately no b/c the server returns 404:

❯ http https://download.openmmlab.com/mmcv/dist/cu113/torch1.12
HTTP/1.1 404 Not Found
Connection: keep-alive
Content-Length: 267
Content-Type: application/xml
Date: Tue, 25 Oct 2022 16:28:17 GMT
Server: AliyunOSS
x-oss-request-id: 63580EA11748995FA9C66803
x-oss-server-time: 3

<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>NoSuchKey</Code>
  <Message>The specified key does not exist.</Message>
  <RequestId>63580EA11748995FA9C66803</RequestId>
  <HostId>download.openmmlab.com</HostId>
  <Key>mmcv/dist/cu113/torch1.12</Key>
</Error>


❯ http https://download.openmmlab.com/mmcv/dist/cu113/torch1.12/
HTTP/1.1 404 Not Found
Connection: keep-alive
Content-Length: 268
Content-Type: application/xml
Date: Tue, 25 Oct 2022 16:28:22 GMT
Server: AliyunOSS
x-oss-request-id: 63580EA56AA16F277B8D9C3C
x-oss-server-time: 1

<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>NoSuchKey</Code>
  <Message>The specified key does not exist.</Message>
  <RequestId>63580EA56AA16F277B8D9C3C</RequestId>
  <HostId>download.openmmlab.com</HostId>
  <Key>mmcv/dist/cu113/torch1.12/</Key>
</Error>

@neersighted
Copy link
Member

Interesting... Looks like a S3-compatible object store. I suppose we didn't consider that use case (servers that don't serve an index page) when designing this feature -- the title should probably be more along the lines of "support files, not just directories, as single page sources." It might be slightly more involved than not adding a trailing slash -- I'd invite you to explore and write some more robust tests for this is you are interested.

@yukw777
Copy link
Author

yukw777 commented Oct 25, 2022

I see. I'm happy to help as much as I can. What kind of tests are you looking for? It seems more involved now since we're not dealing with servers that don't serve index pages.

In the meantime, any workaround?

@yukw777 yukw777 changed the title If a single page link source has relative paths with double dots, poetry fails to resolve to correct wheel URLs. Support files, not just directories, as single page sources Oct 25, 2022
@neersighted
Copy link
Member

As far as tests go, ideally some fixtures/HTTP mocking that reproduce the path structure you're having problems with. After introducing those failing tests, you can more confidently iterate on fixing this.

It could be as simple as the trailing slash, or it could be more complex -- adding as many test cases (for reasonable/real world structures/deployments) as you can think of and fixing the ones that fail is the best way to figure out if a more holistic fix is needed, or if it's just the trailing slash.

Regarding workarounds, a reverse proxy rewriting URLs seems like the easiest/fastest to deploy; if we do have a code fix for this in Poetry, I personally would not be opposed to backporting a fix to the 1.2 branch so we can get it out sooner.

@yukw777
Copy link
Author

yukw777 commented Oct 25, 2022

@neersighted Let me know where I should be adding these test cases, and I’ll see what I can do!

@Maciej-Zwolinski
Copy link

Has anything happened on this front since late-OCT?

I'm having exactly same issue with adding mmcv-full. Adding it directly via {url=""} results in multiple downloads (of the same package/whl) resulting in additional ~300s in every poetry add call which is annoying, especially that it already takes ~30minutes.

@neersighted
Copy link
Member

That's #2415. Unfortunately, code does not appear from nothing; what you see on GitHub is representative of the progress this feature request has made.

@Maciej-Zwolinski
Copy link

Sorry, but I fail to see how these 2 are related. I managed to fix this particular issue with 2 lines of code - many thanks to @yukw777 for pinpointing where the URL is created, so it only took me 10 minutes of setting up local poetry version rather than couple of hours (probably) trying to find where the URLs are born.

Anywho, another, somewhat, related issue we've been experiencing was when mmcv-full was declared with direct {url=} to wheel were the multiple downloads when trying to add some other package. This issue also got solved - why, idk, but it seems downloading/resolving/parsing using source= somehow fixes the issue - previously we experienced the same with PyTorch, but after moving it to source-based declaration the problem was solved.

@neersighted
Copy link
Member

neersighted commented Feb 13, 2023

#2415 describes what you are complaining of -- only source based dependencies are cached. This issue offers one possible solution to solving your issue, but is lower priority than the root cause (#2415) I would say. Indexes are the primary mechanism Poetry is designed around, and definitely a better solution if you can get them to work.

@Maciej-Zwolinski
Copy link

So, do I understand it correctly, url= is preferable to source= ? Does it mean that new features supporting use-cases involving sources are not to be developed - meaning sources ending with index.html will never be compatible with source= declaration?

Also, if I offer a simple fix to THIS particular sub-issue ( source=".../index.html" ) it is not likely to be integrated into poetry proper, do I understand you correctly?

@neersighted
Copy link
Member

You have the meaning completely inverted:

  • The issue you are describing is fundamentally url=, which is not preferred to sources. While it is fortunate that you are able to switch to an index (and this is preferred), it does not negate Poetry downloading same wheels multiple times within a single invocation #2415 as a sharper edge.
  • As per the previous discussion in this thread, a fix for this hardcoded trailing slash is desired but will require tests and a bit of minor refactoring.

@Maciej-Zwolinski
Copy link

Alright, I'll make a PR then. I haven't done any testing besides running our massive repo through this modified version of poetry with switch from url to source, so I'll have to look into it. Some help might be required :)

@yukw777
Copy link
Author

yukw777 commented Feb 13, 2023

@Maciej-Zwolinski Sorry I dropped the ball on this one, but I'm glad my trace has been helpful to you. Please let me know if you need any help with your PR!

@dimbleby
Copy link
Contributor

dimbleby commented Mar 9, 2023

just not adding the trailing slash seems like the correct fix, the danger is that there are users out there who are unknowingly relying on the trailing slash and will be broken by it not being there.

IMO best rip the bandage: make the right fix, say something about it in the changelog, done.

dimbleby added a commit to dimbleby/poetry that referenced this issue Mar 16, 2024
@dimbleby dimbleby linked a pull request Mar 16, 2024 that will close this issue
dimbleby added a commit to dimbleby/poetry that referenced this issue Mar 17, 2024
dimbleby added a commit to dimbleby/poetry that referenced this issue Mar 18, 2024
dimbleby added a commit to dimbleby/poetry that referenced this issue Mar 18, 2024
dimbleby added a commit to dimbleby/poetry that referenced this issue Mar 24, 2024
dimbleby added a commit to dimbleby/poetry that referenced this issue Mar 25, 2024
neersighted pushed a commit to dimbleby/poetry that referenced this issue Mar 25, 2024
neersighted pushed a commit to dimbleby/poetry that referenced this issue Mar 25, 2024
dimbleby added a commit to dimbleby/poetry that referenced this issue Apr 8, 2024
dimbleby added a commit to dimbleby/poetry that referenced this issue Apr 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working as expected status/triage This issue needs to be triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants