Parser breaks off lines at hash symbol following HTML tag #235

peternowee · 2020-08-29T15:56:56Z

This issue is split off from issue #203. We found out there that the parser breaks off lines inside DOT HTML-like labels containing <BR/>#. It breaks them off starting from the hash/pound sign/number sign (#).

Here are two minimal reproducible examples:

Example 1 shows that everything starting from # to the end of the line is dropped:

import pydot
G = pydot.graph_from_dot_data("""
graph G {
    b1 [label=<
         We are the knights who say <BR/># Ni
    >];
}
""")
print(G[0])

Output:

graph G {
b1 [label=<
         We are the knights who say <BR/>
    >];
}

Notice that # Ni is missing.

Example 2 shows that this can become a parsing error when parts with a syntactic meaning are lost:

import pydot
G = pydot.graph_from_dot_data("""
graph G { b4 [label=<We are the knights who say <BR/># Ni>]; }
""")

Output:

graph G { b4 [label=<We are the knights who say <BR/># Ni>]; }
             ^
Expected "}", found '['  (at char 14), (line:2, col:14)

These three lines of output are actually the "explanation" of the ParseException that is raised by PyParsing. Although the input line is printed whole as part of the explanation, the parser currently does not see further than the # character and therefore loses sight of the closing brackets >]; } at the end. That is why it says it expected }, but could not find it.

Obviously, the # is considered the start of a comment, but there is more to it than that, because that does not happen in all cases. Other tests I did seem to indicate that the word-by-word parsing, using whitespace as delimiters, plays a role as well. I suspect that in the end our construction of the DOT Language in PyParsing terms in dot_parser will need to be tweaked to fix this bug.

I will try to post more details later as I find time. If someone else wants to dive into this, please let me know so that we won't be doing the same work twice.

Versions used for examples: Python 3.7.3, pydot 1.4.1+PR227, pyparsing 2.4.7.

The text was updated successfully, but these errors were encountered:

peternowee added bug dot-language labels Aug 29, 2020

peternowee added this to the 1.5.0 or 2.0.0 milestone Aug 29, 2020

peternowee mentioned this issue Aug 29, 2020

Problem with graph.write.png...AssertionError: 1 #203

Closed

peternowee added the parser label Sep 7, 2020

peternowee mentioned this issue Jun 30, 2021

TypeError in parsing dot file #269

Closed

MarcCote mentioned this issue Oct 27, 2021

pyparsing 3.0.2 broke pydot pyparsing/pyparsing#319

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser breaks off lines at hash symbol following HTML tag #235

Parser breaks off lines at hash symbol following HTML tag #235

peternowee commented Aug 29, 2020

Parser breaks off lines at hash symbol following HTML tag #235

Parser breaks off lines at hash symbol following HTML tag #235

Comments

peternowee commented Aug 29, 2020