New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix quoting logic for attribute values #339
base: master
Are you sure you want to change the base?
Conversation
Instead of quote_if_necessary() trying to be all things to all situations, split into quote_id_if_necessary(), which uses the same logic as before, and quote_attr_if_necessary(), which is heavily biased towards quoting, not doing so only if the value is numeric, HTML, or already double-quoted.
Adjust three expected strings to match the new quoting logic for attributes, which will quote strings far more often than before.
Hmm. It seems GitHub has started running some Python 3.7 is actually EOL as of 2023-06-27, so we could just drop it entirely. (We could add in 3.13 as a replacement. It won't technically be out until October, but testing on the pre-release versions isn't the worst idea, to ensure compatibility.) |
Also, fix some docstrings.
Set to Draft state because I just noticed there's a full grammar for ID strings in the Graphviz docs, including regular expressions, which don't match ours. Seems like a good idea to make them match. |
So, turns out I was mostly right. The distinction isn't between attribute values and IDs, but between IDs, and All attribute values are IDs, in their grammar, but only the Technically we could even double-quote node IDs, since all IDs are just strings and it makes no difference to Graphviz whether we write... # This
strict digraph {
a [shape=ellipse style=filled fillcolor="#1f77b4"]
b [shape=polygon style=filled fillcolor="#ff7f0e"]
a -> b [fillcolor="#a6cee3" color="#1f78b4"]
}
# Or this
strict digraph {
"a" [shape="ellipse" style="filled" fillcolor="#1f77b4"]
"b" [shape="polygon" style="filled" fillcolor="#ff7f0e"]
"a" -> "b" [fillcolor="#a6cee3" color="#1f78b4"]
} We just have to be careful that... # This... (OK)
digraph {
node1:port1 -> node2:port5:nw;
}
# Doesn't get turned into this... (WRONG)
digraph {
"node1:port1" -> "node2:port5:nw";
}
# But it can be turned into this... (OK)
digraph {
"node1":"port1" -> "node2":"port5":"nw";
} Part of me almost thinks we should give the user the ability to set the IOW, if a user had to define the graph above not with this: G = pydot.Dot()
G.add_edge(pydot.Edge("node1:port1", "node2:port5:nw")) But with something like this: G = pydot.Dot()
node1 = pydot.Node("node1")
node2 = pydot.Node("node2")
ep1 = pydot.Endpoint(node1, port="port1")
ep1 = pydot.Endpoint(node2, port="port5", compass="nw")
G.add_edge(pydot.Edge(ep1, ep2)) Then constructing the Dot syntax would be a whole hell of a lot easier. And if we provided some method of configuring auto-port/compass parsing off for users who don't want it, they could name their nodes things like Right now, we attempt to parse out the Even worse, if you set your graph up like this: G = pydot.Dot()
node1 = pydot.Node("node1:port1")
node2 = pydot.Node("node2:port5:nw")
G.add_edge(pydot.Edge(node1, node2)) You'll get this output: digraph G {
node1 -> node2;
} We lose everything except the name, when a And that >>> node2.get_port()
':port5:nw' Whoops. Maybe I'll work on adding Heck, we could even provide an enum for the valid compass points, to be used like: ep1 = pydot.Endpoint(
node2, port="port5", compass=pydot.Compass.nw) Notes
|
So, no less than Stephen North hisself has confirmed: digraph {
# Complex Node statements like...
nodeName:portname:nw [attributes=values];
# Are meaningless, and always equivalent to...
nodeName [attributes=values];
} He says it was a "small mistake", that the grammar was specified to make the former legal syntax. The Obviously we can't just disregard the grammar entirely, but IMHO (and as I argued in that discussion), I feel this does give us the leeway to parse pydot.Node("01:Math", color="red") As if the user actually wrote: pydot.Node("\"01:Math\"", color="red") And we can therefore assume they expect this as the resulting "01:Math" [color=red]; As opposed to the original, un-double-doublequoted |
That's a long journey. BTW the port definition is weird, there's another port in form of "<...>", but in a different context (https://graphviz.org/doc/info/shapes.html – section "Record-based nodes"). |
Yeah, that goes inside the label attribute for the destination node, it's how you provide targets for the edge-endpoint ports. Technically without those labels, any ports attached to endpoints do nothing; with labels defined, the Apparently that whole "thing" is now discouraged in favor of HTML-like labels anyway. (Which can also define ports, by adding a |
This started off as an effort to simply fix quoting for attribute values containing colons (#258), which was being hosed by the logic that recognizes ID values in the form
foo:n
as IDs with an optional port value.That's what made me realize that the rules for quoting IDs and the rules for quoting attributes are totally different, and we shouldn't be trying to make them the same.
So, this PR breaks out
quote_if_necessary()
into two functions,quote_id_if_necessary()
(which keeps the same logic as before), andquote_attr_if_necessary()
, which is heavily biased towards quoting. It'll basically quote anything that isn't numeric, HTML, or already double-quoted. Simple string values likeshape=box
that used to be left unquoted will now be output asshape="box"
, because it's never wrong to do that. And it avoids a lot of problems.I also removed the
re.UNICODE
flag from all the regexps, as in Python3 that's a tautology, and replaced a'"' + s + '"'
withfr'"{s}"'
. (Though that needn't have been a raw string, come to think of it.)Three tests (only 3!!) had to be adjusted, so their expected values matched the new quoting rules.
Fixes #258
(Doesn't fix the other issues involving IDs, like #118)