New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parsing TYPEIDs in declarators #169
Conversation
…adding `typeid_declarator`
… least one `type specifier`
…isallow `typeid`s once we've seen a `type_specifier`
…sn't start with a TYPEID
Can you provide a higher-level overview of what's being done here? Also, it would be nice to add more testing since this is a major change. [FWIW I'm squashing all commits in Pull Requests] |
Here's an excerpt from my comments on #46:
Since I wrote that, there was one other change required, which is to disallow
I agree it is a big change, and I can provide more tests. I can also add some more detail in the comments if you'd like. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the additional details. I'm inclined to accept this PR after we work through the review.
More tests would certainly be good. At the very least the cases described in http://eli.thegreenplace.net/2011/05/02/the-context-sensitivity-of-cs-grammar-revisited should be tested, and more tests would be welcome too
I also left some code review comments.
@@ -347,7 +353,7 @@ def _fix_decl_name_type(self, decl, typename): | |||
coord=typename[0].coord) | |||
return decl | |||
|
|||
def _add_declaration_specifier(self, declspec, newspec, kind): | |||
def _add_declaration_specifier(self, declspec, newspec, kind, append=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Describe the append parameter in the docstring
pycparser/c_parser.py
Outdated
def p_declarator_2(self, p): | ||
""" declarator : pointer direct_declarator | ||
@parameterized(('id', 'ID'), ('typeid', 'TYPEID'), ('typeid_noparen', 'TYPEID')) | ||
def _p_XXX_declarator_1(self, p): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unfortunate that these rules both have the decorator and need a special name? Can't the decorator just attach a function attribute that would be recognized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "special name", the XXX part or the leading underscore? If you're talking about the XXX, I had assumed that the p_
functions had to be named the same as their nonterminal. If that's not the case, I can remove the XXX.
If you're talking about the underscore, it's required so the function doesn't get treated as a rule directly. If you omitted the underscore, you'd have to either rename or delete the template method to keep this from happening, which means you'd need different behavior when first instantiating a CParser.
This arises because I wrote _create_param_rules()
in a similar way as create_opt_rule()
, which is called during __init__()
. What makes more sense to me is to instead use a very simple metaclass so this modification (or creation) of methods only happens at class-creation time, not every time a parser is instantiated. The metaclass's __new__()
method simply loops through the class members, modifies them as described, and passes the modified classdict to type()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, due to Python 2/3 differences, a metaclass may not be the best solution. Instead, a class decorator is probably the way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, OK. That makes sense. I don't think the extra complexity is worth it (having a metaclass or class decorator here), so we can keep it as is.
pycparser/plyparser.py
Outdated
def _create_param_rules(self, bound_rule_method): | ||
""" Creates ply.yacc rules based on a parameterized bound method. """ | ||
f = bound_rule_method.__func__ | ||
for xxx, yyy in f._params: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs more comments
Thanks for taking the time to review this. I've asked for clarification on one of your comments, and in the mean time I'll start working on implementing more tests and adding better documentation. |
I'm at a bit of a crossroads, and I'd like your input. All of the test cases in your blog post were already in place, except the last one, which goes like this:
This code parses without choking, but incorrectly treats both I'm thinking the best route would be to submit those changes as a separate PR, since this one is already quite large. Does that sound OK to you, or would you prefer them to be changed all at once? Again, thanks for taking the time to deal with this! |
I think it's OK handling this special case in a separate PR. It may take me a couple of days to get back to this review -- I'm not forgetting, just juggling different things ATM. |
There's a couple remaining review comments - both minor. Please address them and I'll merge |
Hopefully I didn't get too far out front, but I already added the class decorator approach in d53e36a and subsequently removed the need for underscores in 79100bd. The code's basically the same as before, it's just run via a class decorator rather than in As for the tests and extra comments/documentation, they've all been remedied in yesterday's commits. Please let me know if there's anything else left outstanding. |
Merged, thanks! |
Awesome, thanks! |
This currently breaks https://github.com/pyca/cryptography build
Using pycparser at 599a495 and all other related packages versions unchanged works fine. |
I would bring this up with the If you try to compile
with gcc or clang, both will fail to parse, having interpreted the second Maybe we could add a minor way for users of Here's a basic proposal: Add dummy token type called |
@eliben I don't actually understand all the lower level stuff going oh here. Huge thanks for your great work. |
@asavah fair enough, thanks for reporting! |
Just to close out this thread, I've moved the discussion to a new cffi issue. Depending how that goes, I may start a new issue/PR on what pycparser can do to help. |
I just cloned from master and I'm seeing a number of errors and warnings and it looks like they're related to this change ERROR: c_parser.py:630: Symbol 'id_init_declarator_list_opt' used, but not defined as a token or a rule |
This reverts commit 14b8a7e. Conflicts: pycparser/plyparser.py
This fixes the parsing of TYPEIDs in declarators (and related expressions) once and for all, removing existing workarounds for specific cases of the problem. In particular, it solves the problem in the current parser where a TYPEID is used in a list of multiple declarators, for which there is no workaround.
All tests are passing for me, and I added 2 more tests related to parsing declarators correctly.
I've tried to organize the commits as logically and discretely as possible to make it clear what is happening in each step:
declaration-specifiers
andspecifier-qualifier-list
to contain at least one type-specifier, and only one if it is atypedef-name
parameter-declaration
s to interpret a TYPEID as atypedef-name
in cases of ambiguityIt is likely there is some more leftover "workaround" code that can removed, but I decided it was best to do the PR as-is for now, as that may take some careful thought (and maybe more tests to ensure no change in behavior).