Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow fallback to return multiple identifiers to try #11

Closed
pawamoy opened this issue Nov 22, 2021 · 6 comments
Closed

Allow fallback to return multiple identifiers to try #11

pawamoy opened this issue Nov 22, 2021 · 6 comments

Comments

@pawamoy
Copy link
Member

pawamoy commented Nov 22, 2021

Currently, when autorefs doesn't know the URL for an identifier, it falls back to collecting the object through every handler (and getting an anchor through its renderer) until an anchor is found.

# get_item_url method in autorefs plugin
try:
    url = self._url_map[identifier]
except KeyError:
    if identifier in self._abs_url_map:
        return self._abs_url_map[identifier]

    if fallback:
        new_identifier = fallback(identifier)
        if new_identifier:
            return self.get_item_url(new_identifier, from_url)

    raise

if from_url is not None:
    return relative_url(from_url, url)
return url
# fallback method in mkdocstrings Handlers
for handler in self._handlers.values():
    try:
        anchor = handler.renderer.get_anchor(handler.collector.collect(identifier, {}))
    except CollectionError:
        continue
    if anchor is not None:
        return anchor
return None

It works well when canonical objects are rendered (the ones with canonical paths, i.e. the identifier corresponding to where they are defined, not imported/aliased), but not the other way around.

Example:

  • pkg.base defines Class
  • pkg imports pkg.base.Class as Class
  • therefore pkg.Class is an alias to pkg.base.Class
  • we render Class with its canonical path, autorefs maps its canonical path to its URL
  • now when autorefs fixes a [Class][pkg.Class] crossref, it falls back as mentioned above
  • the corresponding handler collects pkg.Class, and returns its canonical path: pkg.base.Class
  • great, autorefs knows this identifier, it can fix the crossref

The other way around:

  • (same) pkg.base defines Class
  • (same) pkg import pkg.base.Class as Class
  • (same) therefore pkg.Class is an alias to pkg.base.Class
  • we render Class with one of its alias path (there could be many), autorefs maps this alias path to its URL (it should not use the canonical path, because the alias is the public one, that's what the user wants)
  • now when autorefs fixes [Class][pkg.base.Class] (which happens, for example, because of Python annotations that are resolved using their actual scope, which is where they are defined, not imported), it falls back as mentioned above
  • the corresponding handler collects pkg.base.Class, and return its canonical path: pkg.base.Class
  • oops, that's the same path as before, autorefs is sad

So, what we need here is one of two things, or both:

  1. the fallback method returns multiple identifiers that autorefs can test
# get_item_url method in autorefs plugin
...
    if fallback:
        new_identifiers = fallback(identifier)
        for new_identifier in new_identifiers:
            with contextlib.suppress(KeyError):
                return self.get_item_url(new_identifier, from_url)
        raise
...
# fallback method in mkdocstrings Handlers
for handler in self._handlers.values():
    try:
        anchors = handler.renderer.get_anchors(handler.collector.collect(identifier, {}))
    except CollectionError:
        continue
    if anchors:
        return anchors
return []
  1. autorefs somehow updates its URL map with alias->canonical and canonical->alias as soon as possible. For example, when we render a canonical object, update the URL map with all the alias paths mapping to the canonical path (depends on the ability of the collector/collected data). When we render an alias, update the URL map with the canonical path mapping to the alias path.
    Of course if you render the same object, alias or canonical, multiple times, the last iteration wins as it will override mappings in the URL map. But we can consider it an invalid use-case, or a limited one.

The option 2 would also prevent redundant, costly fallbacks. Maybe this can already be improved by updating the URL map when finding a new valid anchor/identifier.

# get_item_url method in autorefs plugin
...
    if fallback:
        new_identifiers = fallback(identifier)
        for new_identifier in new_identifiers:
            with contextlib.suppress(KeyError):
                url = self.get_item_url(new_identifier, from_url)
                self._url_map[identifier] = url  # update the map to avoid doing all this again
                return url
        raise
...

Maybe we could call the renderers get_anchor[s] method early, in mkdocstrings.extension.AutodocProcessor.run and mkdocstrings.extension.AutodocProcessor._process_block, to register more anchors into autorefs.

# lets say _process_block also returns the object (data)
html, handler, data = self._process_block(identifier, block, heading_level)
...
for anchor in handler.renderer.get_anchors(data):
    self._autorefs.register_anchor(page, anchor)
@oprypin oprypin assigned oprypin and unassigned oprypin Nov 22, 2021
@oprypin
Copy link
Member

oprypin commented Nov 22, 2021

Oops I thought this was a pull request 😅
But I'll definitely check this out

@pawamoy
Copy link
Member Author

pawamoy commented Nov 22, 2021

Thanks! I'll experiment anyway, and send a PR if I get something elegant 🙂

@oprypin
Copy link
Member

oprypin commented Nov 23, 2021

we render Class with one of its alias path (there could be many), autorefs maps this alias path to its URL (it should not use the canonical path, because the alias is the public one, that's what the user wants)

Hmm I've certainly never planned for this case, I thought everything should always have unambiguous canonical path. That is, perhaps I straight up disagree with "it should not use the canonical path".
If the handler wants to do shenanigans with "canonical aliases", perhaps it should somehow consistently report one chosen path.

the corresponding handler collects pkg.base.Class, and return its canonical path: pkg.base.Class

Well no, the returned canonical path should be pkg.Class, that's what the user wants.


That's all I have to say for now and probably won't actually be able to look into this in detail.
I don't know if this is useful, but maybe it'll lead you to some interesting thought.

@pawamoy
Copy link
Member Author

pawamoy commented Nov 23, 2021

Sure, your input is always welcomed 🙂

That is, perhaps I straight up disagree with "it should not use the canonical path"

I thought about it, and yes, perhaps it could use both, which corresponds to the proposed solutions, but not the current situation.

Well no, the returned canonical path should be pkg.Class, that's what the user wants.

True as well, but considering an object can have multiple aliases, the handler has no way of knowing which one of the aliases the object is registered with.

Of course I've very biased toward Python, which certainly does not mirror other languages concepts. But I guess populating the URL map with more data would not be harmful anyway.

@oprypin
Copy link
Member

oprypin commented Nov 23, 2021

True as well, but considering an object can have multiple aliases, the handler has no way of knowing which one of the aliases the object is registered with.

I'm not sure if this is fully true.

The handler could perhaps memorize the first identifier it was collected through and use that henceforth.

But that would probably be considered more hacky than what you're suggesting..

@pawamoy
Copy link
Member Author

pawamoy commented Nov 23, 2021

Yeah there are multiple options here!

pawamoy added a commit to mkdocstrings/mkdocstrings that referenced this issue Nov 28, 2021
Since an object can have aliases (different identifiers
leading to it), and since users sometimes want to
render an object using one of its aliases instead of its
canonical identifier, we make sure to register every
identifier associated to an object so that autorefs
can find it when fixing cross-references.

Issue mkdocstrings/autorefs#11: mkdocstrings/autorefs#11
pawamoy added a commit to mkdocstrings/mkdocstrings that referenced this issue Dec 5, 2021
Since an object can have aliases (different identifiers
leading to it), and since users sometimes want to
render an object using one of its aliases instead of its
canonical identifier, we make sure to register every
identifier associated to an object so that autorefs
can find it when fixing cross-references.

Issue mkdocstrings/autorefs#11: mkdocstrings/autorefs#11
pawamoy added a commit to mkdocstrings/mkdocstrings that referenced this issue Dec 14, 2021
Since an object can have aliases (different identifiers
leading to it), and since users sometimes want to
render an object using one of its aliases instead of its
canonical identifier, we make sure to register every
identifier associated to an object so that autorefs
can find it when fixing cross-references.

Issue mkdocstrings/autorefs#11: mkdocstrings/autorefs#11
PR #350: #350
@pawamoy pawamoy closed this as completed Jan 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants