Typing `pending_xref` node attrributes (and maybe all `addnodes`) #12208

chrisjsewell · 2024-03-26T14:20:09Z

Currently, pending_xref includes no information about what attributes it is expected to have:

Lines 477 to 485 in bfce4f5

    
           class pending_xref(nodes.Inline, nodes.Element): 
        
               """Node for cross-references that cannot be resolved without complete 
        
               information about all documents. 
        
               These nodes are resolved before writing output, in 
        
               BuildEnvironment.resolve_references. 
        
               """ 
        
               child_text_separator = ''

There are definitely some attributes that are expected, i.e. they are accessed with node['name'] and would KeyError except if the attribute is not present, then there are ones that are maybe optional, i.e. they are accessed with node.get('name')

I haven't checked if there are actually any open issue, but there definitely seems like there are discrepancies in the code base, here are some examples:

sphinx/sphinx/transforms/post_transforms/__init__.py

Lines 208 to 210 in bfce4f5

    
           elif node.get('refdomain', 'std') not in ('', 'std'): 
        
               msg = (__('%s:%s reference target not found: %s') % 
        
                      (node['refdomain'], typ, target))

here the "check" treats refdomain as if it is optional, but then the msg creation treats it as if it is required 😅

sphinx/sphinx/builders/latex/__init__.py

Lines 373 to 375 in bfce4f5

    
           for pendingnode in largetree.findall(addnodes.pending_xref): 
        
               docname = pendingnode['refdocname'] 
        
               sectname = pendingnode['refsectname']

here refsectname is treated as required, but if you actually search for refsectname in the code base, it is definitely not set in a lot of places:

sphinx/builders/texinfo.py:
  166:             sectname = pendingnode['refsectname']
sphinx/builders/latex/__init__.py:
  375:             sectname = pendingnode['refsectname']
sphinx/domains/std/__init__.py:
  776:             contnode['refsectname'] = sectname

Am I missing something here? How is some of this stuff not excepting?

@danieleades do you have any thoughts on how to type Node subclasses, to have some kind of enforcing of available attributes.
Ideally it would be enforced for instantiating the node, and for accessing attributes in an existing node, but I know this would be extremely difficult, if not impossible 😏

Also cc @picnixz may have an opinion?

The text was updated successfully, but these errors were encountered:

picnixz · 2024-03-26T14:58:33Z

Ah yes, the pending_xref node. The issue with the attributes is that there were different people implementing different things at different places and I don't think we ever had a doc telling the exact specs of that node.

here the "check" treats refdomain as if it is optional, but then the msg creation treats it as if it is required 😅

It becomes "required" in the sense that node.get('refdomain', 'std') returns 'std' or node['refdomain']. But since you excluded the 'std' case, you get node['refdomain'] so it's fine.

here refsectname is treated as required, but if you actually search for refsectname in the code base, it is definitely not set in a lot of places:

The standard domain is always present, so the corresponding transformations and post-transformations as well. In particular, when you call env.resolve_references(largetree, ...), you also call the resolve_xref which may either remove your pending_xref nodes or build dangling references nodes that are still instances of pending_xref nodes. You can see that there are not a lot of places where you explicitly create pending_xref nodes after resolving (the resolve_xref calling _resolving_ref in the standard domain is an example) so I think it's the only place which, when after resolving everything, you still have the possibility of having such nodes (with the expected attribute).

Now, we should have added a better comment or be sure that the only remaining pending_xref nodes are created by the standard domain for ref.

picnixz · 2024-03-26T15:02:12Z

As for your original question: yes and no. Yes for common attributes that are known to exist, but that's not always the case (like, refdomain might not exist in some extensions) and you might not know at construction time the final information that is needed (for instance Domain.process_field_xref might add additional information to the node when called).

danieleades · 2024-03-26T15:10:11Z

I'm not familiar with this part of the code base, so my comments will be general rather than specific.

Sphinx uses 'stringly-typed' interfaces in a lot of places by using a dict-like interface for many objects. As you observed, often without default arguments or error handling. That doesn't imply it's a bug, more likely the author knows some invariant that mypy doesn't. The problem comes if you refactor and unknowingly violate the invariant... If we can refactor more code to be interrogable by mypy then refactoring can become more "fearless"

Where we're absolutely certain that a property must always exist on a class or instance of a class then I think it makes sense to create a subclass and set the property in the init method (or potentially as class variables where appropriate) and then access these properties directly rather than through the dict-like interface.

chrisjsewell · 2024-03-26T15:29:08Z

like, refdomain might not exist in some extensions

Well, that is one of the things I'm trying to work out for myst-parser 😅
But I think this has to exist, since it is expected in a number of places:

sphinx/builders/latex/transforms.py:
  563:             if node['refdomain'] == 'math' and node['reftype'] in ('eq', 'numref'):

sphinx/domains/math.py:
  32:         node['refdomain'] = 'math'

sphinx/ext/intersphinx.py:
  751:                        (node['refdomain'], typ, node['reftarget']))

sphinx/transforms/i18n.py:
  308:             case = node["refdomain"], node["reftype"]
  313:                     node["refdomain"],

sphinx/transforms/post_transforms/__init__.py:
   87:                         domain = self.env.domains[node['refdomain']]
  210:                    (node['refdomain'], typ, target))

tests/test_extensions/test_ext_intersphinx.py:
  32:     node['refdomain'] = domain

If we can refactor more code to be interrogable by mypy then refactoring can become more "fearless"

exactly, amen to that 🙏

picnixz · 2024-03-26T15:46:56Z

I'd be happy to move out to property-based interface but I think the choice is mostly historical and docutils motivated. Alternatively, I could suggest having a mypy plugin for that instead because it could be integrated in the current framework without refactoring anything (or barely).

But I think this has to exist, since it is expected in a number of places:

It is expected but I'm not sure that it is necessarily expected at the beginning of the transformation chain. Transformations and post-transformations have different priorities and I'm pretty sure there are some that dynamically add information because their authors know some invariants (that we do not) to process it in the future.

chrisjsewell added the internals:typing label Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Typing `pending_xref` node attrributes (and maybe all `addnodes`) #12208

Typing `pending_xref` node attrributes (and maybe all `addnodes`) #12208

chrisjsewell commented Mar 26, 2024 •

edited

picnixz commented Mar 26, 2024

picnixz commented Mar 26, 2024

danieleades commented Mar 26, 2024

chrisjsewell commented Mar 26, 2024 •

edited

picnixz commented Mar 26, 2024

Typing pending_xref node attrributes (and maybe all addnodes) #12208

Typing pending_xref node attrributes (and maybe all addnodes) #12208

Comments

chrisjsewell commented Mar 26, 2024 • edited

picnixz commented Mar 26, 2024

picnixz commented Mar 26, 2024

danieleades commented Mar 26, 2024

chrisjsewell commented Mar 26, 2024 • edited

picnixz commented Mar 26, 2024

Typing `pending_xref` node attrributes (and maybe all `addnodes`) #12208

Typing `pending_xref` node attrributes (and maybe all `addnodes`) #12208

chrisjsewell commented Mar 26, 2024 •

edited

chrisjsewell commented Mar 26, 2024 •

edited