Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser breaks off lines at hash symbol following HTML tag #235

Open
peternowee opened this issue Aug 29, 2020 · 0 comments
Open

Parser breaks off lines at hash symbol following HTML tag #235

peternowee opened this issue Aug 29, 2020 · 0 comments

Comments

@peternowee
Copy link
Member

This issue is split off from issue #203. We found out there that the parser breaks off lines inside DOT HTML-like labels containing <BR/>#. It breaks them off starting from the hash/pound sign/number sign (#).

Here are two minimal reproducible examples:

Example 1 shows that everything starting from # to the end of the line is dropped:

import pydot
G = pydot.graph_from_dot_data("""
graph G {
    b1 [label=<
         We are the knights who say <BR/># Ni
    >];
}
""")
print(G[0])

Output:

graph G {
b1 [label=<
         We are the knights who say <BR/>
    >];
}

Notice that # Ni is missing.

Example 2 shows that this can become a parsing error when parts with a syntactic meaning are lost:

import pydot
G = pydot.graph_from_dot_data("""
graph G { b4 [label=<We are the knights who say <BR/># Ni>]; }
""")

Output:

graph G { b4 [label=<We are the knights who say <BR/># Ni>]; }
             ^
Expected "}", found '['  (at char 14), (line:2, col:14)

These three lines of output are actually the "explanation" of the ParseException that is raised by PyParsing. Although the input line is printed whole as part of the explanation, the parser currently does not see further than the # character and therefore loses sight of the closing brackets >]; } at the end. That is why it says it expected }, but could not find it.

Obviously, the # is considered the start of a comment, but there is more to it than that, because that does not happen in all cases. Other tests I did seem to indicate that the word-by-word parsing, using whitespace as delimiters, plays a role as well. I suspect that in the end our construction of the DOT Language in PyParsing terms in dot_parser will need to be tweaked to fix this bug.

I will try to post more details later as I find time. If someone else wants to dive into this, please let me know so that we won't be doing the same work twice.

Versions used for examples: Python 3.7.3, pydot 1.4.1+PR227, pyparsing 2.4.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant