You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is split off from issue #203. We found out there that the parser breaks off lines inside DOT HTML-like labels containing <BR/>#. It breaks them off starting from the hash/pound sign/number sign (#).
Here are two minimal reproducible examples:
Example 1 shows that everything starting from # to the end of the line is dropped:
importpydotG=pydot.graph_from_dot_data("""graph G { b1 [label=< We are the knights who say <BR/># Ni >];}""")
print(G[0])
Output:
graph G {
b1 [label=<
We are the knights who say <BR/>
>];
}
Notice that # Ni is missing.
Example 2 shows that this can become a parsing error when parts with a syntactic meaning are lost:
importpydotG=pydot.graph_from_dot_data("""graph G { b4 [label=<We are the knights who say <BR/># Ni>]; }""")
Output:
graph G { b4 [label=<We are the knights who say <BR/># Ni>]; }
^
Expected "}", found '[' (at char 14), (line:2, col:14)
These three lines of output are actually the "explanation" of the ParseException that is raised by PyParsing. Although the input line is printed whole as part of the explanation, the parser currently does not see further than the # character and therefore loses sight of the closing brackets >]; } at the end. That is why it says it expected }, but could not find it.
Obviously, the # is considered the start of a comment, but there is more to it than that, because that does not happen in all cases. Other tests I did seem to indicate that the word-by-word parsing, using whitespace as delimiters, plays a role as well. I suspect that in the end our construction of the DOT Language in PyParsing terms in dot_parser will need to be tweaked to fix this bug.
I will try to post more details later as I find time. If someone else wants to dive into this, please let me know so that we won't be doing the same work twice.
Versions used for examples: Python 3.7.3, pydot 1.4.1+PR227, pyparsing 2.4.7.
The text was updated successfully, but these errors were encountered:
This issue is split off from issue #203. We found out there that the parser breaks off lines inside DOT HTML-like labels containing
<BR/>#
. It breaks them off starting from the hash/pound sign/number sign (#
).Here are two minimal reproducible examples:
Example 1 shows that everything starting from
#
to the end of the line is dropped:Output:
Notice that
# Ni
is missing.Example 2 shows that this can become a parsing error when parts with a syntactic meaning are lost:
Output:
These three lines of output are actually the "explanation" of the
ParseException
that is raised by PyParsing. Although the input line is printed whole as part of the explanation, the parser currently does not see further than the#
character and therefore loses sight of the closing brackets>]; }
at the end. That is why it says it expected}
, but could not find it.Obviously, the
#
is considered the start of a comment, but there is more to it than that, because that does not happen in all cases. Other tests I did seem to indicate that the word-by-word parsing, using whitespace as delimiters, plays a role as well. I suspect that in the end our construction of the DOT Language in PyParsing terms indot_parser
will need to be tweaked to fix this bug.I will try to post more details later as I find time. If someone else wants to dive into this, please let me know so that we won't be doing the same work twice.
Versions used for examples: Python 3.7.3, pydot 1.4.1+PR227, pyparsing 2.4.7.
The text was updated successfully, but these errors were encountered: