Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typing pending_xref node attrributes (and maybe all addnodes) #12208

Open
chrisjsewell opened this issue Mar 26, 2024 · 5 comments
Open

Typing pending_xref node attrributes (and maybe all addnodes) #12208

chrisjsewell opened this issue Mar 26, 2024 · 5 comments

Comments

@chrisjsewell
Copy link
Member

chrisjsewell commented Mar 26, 2024

Currently, pending_xref includes no information about what attributes it is expected to have:

sphinx/sphinx/addnodes.py

Lines 477 to 485 in bfce4f5

class pending_xref(nodes.Inline, nodes.Element):
"""Node for cross-references that cannot be resolved without complete
information about all documents.
These nodes are resolved before writing output, in
BuildEnvironment.resolve_references.
"""
child_text_separator = ''

There are definitely some attributes that are expected, i.e. they are accessed with node['name'] and would KeyError except if the attribute is not present, then there are ones that are maybe optional, i.e. they are accessed with node.get('name')

I haven't checked if there are actually any open issue, but there definitely seems like there are discrepancies in the code base, here are some examples:

elif node.get('refdomain', 'std') not in ('', 'std'):
msg = (__('%s:%s reference target not found: %s') %
(node['refdomain'], typ, target))

here the "check" treats refdomain as if it is optional, but then the msg creation treats it as if it is required 😅

for pendingnode in largetree.findall(addnodes.pending_xref):
docname = pendingnode['refdocname']
sectname = pendingnode['refsectname']

here refsectname is treated as required, but if you actually search for refsectname in the code base, it is definitely not set in a lot of places:

sphinx/builders/texinfo.py:
  166:             sectname = pendingnode['refsectname']
sphinx/builders/latex/__init__.py:
  375:             sectname = pendingnode['refsectname']
sphinx/domains/std/__init__.py:
  776:             contnode['refsectname'] = sectname

Am I missing something here? How is some of this stuff not excepting?

@danieleades do you have any thoughts on how to type Node subclasses, to have some kind of enforcing of available attributes.
Ideally it would be enforced for instantiating the node, and for accessing attributes in an existing node, but I know this would be extremely difficult, if not impossible 😏

Also cc @picnixz may have an opinion?

@picnixz
Copy link
Member

picnixz commented Mar 26, 2024

Ah yes, the pending_xref node. The issue with the attributes is that there were different people implementing different things at different places and I don't think we ever had a doc telling the exact specs of that node.

here the "check" treats refdomain as if it is optional, but then the msg creation treats it as if it is required 😅

It becomes "required" in the sense that node.get('refdomain', 'std') returns 'std' or node['refdomain']. But since you excluded the 'std' case, you get node['refdomain'] so it's fine.

here refsectname is treated as required, but if you actually search for refsectname in the code base, it is definitely not set in a lot of places:

The standard domain is always present, so the corresponding transformations and post-transformations as well. In particular, when you call env.resolve_references(largetree, ...), you also call the resolve_xref which may either remove your pending_xref nodes or build dangling references nodes that are still instances of pending_xref nodes. You can see that there are not a lot of places where you explicitly create pending_xref nodes after resolving (the resolve_xref calling _resolving_ref in the standard domain is an example) so I think it's the only place which, when after resolving everything, you still have the possibility of having such nodes (with the expected attribute).

Now, we should have added a better comment or be sure that the only remaining pending_xref nodes are created by the standard domain for ref.

@picnixz
Copy link
Member

picnixz commented Mar 26, 2024

As for your original question: yes and no. Yes for common attributes that are known to exist, but that's not always the case (like, refdomain might not exist in some extensions) and you might not know at construction time the final information that is needed (for instance Domain.process_field_xref might add additional information to the node when called).

@danieleades
Copy link
Contributor

I'm not familiar with this part of the code base, so my comments will be general rather than specific.

Sphinx uses 'stringly-typed' interfaces in a lot of places by using a dict-like interface for many objects. As you observed, often without default arguments or error handling. That doesn't imply it's a bug, more likely the author knows some invariant that mypy doesn't. The problem comes if you refactor and unknowingly violate the invariant... If we can refactor more code to be interrogable by mypy then refactoring can become more "fearless"

Where we're absolutely certain that a property must always exist on a class or instance of a class then I think it makes sense to create a subclass and set the property in the init method (or potentially as class variables where appropriate) and then access these properties directly rather than through the dict-like interface.

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Mar 26, 2024

like, refdomain might not exist in some extensions

Well, that is one of the things I'm trying to work out for myst-parser 😅
But I think this has to exist, since it is expected in a number of places:

sphinx/builders/latex/transforms.py:
  563:             if node['refdomain'] == 'math' and node['reftype'] in ('eq', 'numref'):

sphinx/domains/math.py:
  32:         node['refdomain'] = 'math'

sphinx/ext/intersphinx.py:
  751:                        (node['refdomain'], typ, node['reftarget']))

sphinx/transforms/i18n.py:
  308:             case = node["refdomain"], node["reftype"]
  313:                     node["refdomain"],

sphinx/transforms/post_transforms/__init__.py:
   87:                         domain = self.env.domains[node['refdomain']]
  210:                    (node['refdomain'], typ, target))

tests/test_extensions/test_ext_intersphinx.py:
  32:     node['refdomain'] = domain

If we can refactor more code to be interrogable by mypy then refactoring can become more "fearless"

exactly, amen to that 🙏

@picnixz
Copy link
Member

picnixz commented Mar 26, 2024

I'd be happy to move out to property-based interface but I think the choice is mostly historical and docutils motivated. Alternatively, I could suggest having a mypy plugin for that instead because it could be integrated in the current framework without refactoring anything (or barely).

But I think this has to exist, since it is expected in a number of places:

It is expected but I'm not sure that it is necessarily expected at the beginning of the transformation chain. Transformations and post-transformations have different priorities and I'm pretty sure there are some that dynamically add information because their authors know some invariants (that we do not) to process it in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants