Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing TYPEIDs in declarators #169

Merged
merged 24 commits into from Feb 22, 2017
Merged

Fix parsing TYPEIDs in declarators #169

merged 24 commits into from Feb 22, 2017

Conversation

natezb
Copy link
Contributor

@natezb natezb commented Feb 15, 2017

This fixes the parsing of TYPEIDs in declarators (and related expressions) once and for all, removing existing workarounds for specific cases of the problem. In particular, it solves the problem in the current parser where a TYPEID is used in a list of multiple declarators, for which there is no workaround.

All tests are passing for me, and I added 2 more tests related to parsing declarators correctly.

I've tried to organize the commits as logically and discretely as possible to make it clear what is happening in each step:

  1. Remove workaround productions
  2. Allow TYPEIDs in declarators
  3. Restrict declaration-specifiers and specifier-qualifier-list to contain at least one type-specifier, and only one if it is a typedef-name
  4. Force parameter-declarations to interpret a TYPEID as a typedef-name in cases of ambiguity

It is likely there is some more leftover "workaround" code that can removed, but I decided it was best to do the PR as-is for now, as that may take some careful thought (and maybe more tests to ensure no change in behavior).

@eliben
Copy link
Owner

eliben commented Feb 16, 2017

Can you provide a higher-level overview of what's being done here?

Also, it would be nice to add more testing since this is a major change.

[FWIW I'm squashing all commits in Pull Requests]

@natezb
Copy link
Contributor Author

natezb commented Feb 16, 2017

Here's an excerpt from my comments on #46:

The essential change is to allow a TYPEID to be part of a declarator. We introduce two categories of declarator: (1) an id-declarator (equivalent to what is currently just a 'declarator'), and (2) a typeid-declarator, which is identical but uses a TYPEID rather than an ID. In most cases, these two get recombined into a generic 'declarator', which can be either of the two. In the special case of a function definition, its declarator must be an id-declarator (this restriction is valid b/c a function name can't have been typedef'd already because such a typedef would have happened in the same outermost scope).

However, allowing TYPEIDs in declarators obviously will introduce some parsing conflicts. To fix these, the other main change is to make a restriction to declaration-specifiers (and specifier-qualifier-list) which reflects two things: (1) it must contain at least one type-specifier, and (2) if it contains a typedef-name, it may not contain any other type-specifiers. This is required so that the parser knows when to switch from parsing a specifiers list to parsing a declarator. It's also consistent with declaration-specifiers as described in section 6.7.2.2 of ISO/IEC 9899:TC3.

After making these changes, you can then remove the existing productions that are workarounds to deal with incorrect parses.

Since I wrote that, there was one other change required, which is to disallow typeid-declarators nested inside parens as part of a parameter declaration. This is because in this situation the TYPEID should always be treated as a typedef-name, per section 6.7.5.3.11 of ISO/IEC 9899:TC3:

If, in a parameter declaration, an identifier can be treated either as a typedef name or as a parameter name, it shall be taken as a typedef name.

I agree it is a big change, and I can provide more tests. I can also add some more detail in the comments if you'd like.

Copy link
Owner

@eliben eliben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional details. I'm inclined to accept this PR after we work through the review.

More tests would certainly be good. At the very least the cases described in http://eli.thegreenplace.net/2011/05/02/the-context-sensitivity-of-cs-grammar-revisited should be tested, and more tests would be welcome too

I also left some code review comments.

@@ -347,7 +353,7 @@ def _fix_decl_name_type(self, decl, typename):
coord=typename[0].coord)
return decl

def _add_declaration_specifier(self, declspec, newspec, kind):
def _add_declaration_specifier(self, declspec, newspec, kind, append=False):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describe the append parameter in the docstring

def p_declarator_2(self, p):
""" declarator : pointer direct_declarator
@parameterized(('id', 'ID'), ('typeid', 'TYPEID'), ('typeid_noparen', 'TYPEID'))
def _p_XXX_declarator_1(self, p):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that these rules both have the decorator and need a special name? Can't the decorator just attach a function attribute that would be recognized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "special name", the XXX part or the leading underscore? If you're talking about the XXX, I had assumed that the p_ functions had to be named the same as their nonterminal. If that's not the case, I can remove the XXX.

If you're talking about the underscore, it's required so the function doesn't get treated as a rule directly. If you omitted the underscore, you'd have to either rename or delete the template method to keep this from happening, which means you'd need different behavior when first instantiating a CParser.

This arises because I wrote _create_param_rules() in a similar way as create_opt_rule(), which is called during __init__(). What makes more sense to me is to instead use a very simple metaclass so this modification (or creation) of methods only happens at class-creation time, not every time a parser is instantiated. The metaclass's __new__() method simply loops through the class members, modifies them as described, and passes the modified classdict to type().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, due to Python 2/3 differences, a metaclass may not be the best solution. Instead, a class decorator is probably the way to go.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK. That makes sense. I don't think the extra complexity is worth it (having a metaclass or class decorator here), so we can keep it as is.

def _create_param_rules(self, bound_rule_method):
""" Creates ply.yacc rules based on a parameterized bound method. """
f = bound_rule_method.__func__
for xxx, yyy in f._params:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs more comments

@natezb
Copy link
Contributor Author

natezb commented Feb 19, 2017

Thanks for taking the time to review this. I've asked for clarification on one of your comments, and in the mean time I'll start working on implementing more tests and adding better documentation.

@natezb
Copy link
Contributor Author

natezb commented Feb 20, 2017

I'm at a bit of a crossroads, and I'd like your input. All of the test cases in your blog post were already in place, except the last one, which goes like this:

typedef char AA;
void foo()
{
  int aa = sizeof(AA), AA, bb = sizeof(AA);
}

This code parses without choking, but incorrectly treats both AA's in the sizeof(AA)'s as types, where it should treat the second one as an expression. This isn't disastrous, since this code won't even parse before this PR. However, I would like to ultimately get this right, which will involve some changes to the machinery of how declarations are created, since they would have to be generated as each declarator is parsed, rather than all-at-once once in the decl-body production.

I'm thinking the best route would be to submit those changes as a separate PR, since this one is already quite large. Does that sound OK to you, or would you prefer them to be changed all at once?

Again, thanks for taking the time to deal with this!

@eliben
Copy link
Owner

eliben commented Feb 20, 2017

I think it's OK handling this special case in a separate PR. It may take me a couple of days to get back to this review -- I'm not forgetting, just juggling different things ATM.

@eliben
Copy link
Owner

eliben commented Feb 22, 2017

There's a couple remaining review comments - both minor. Please address them and I'll merge

@natezb
Copy link
Contributor Author

natezb commented Feb 22, 2017

Hopefully I didn't get too far out front, but I already added the class decorator approach in d53e36a and subsequently removed the need for underscores in 79100bd. The code's basically the same as before, it's just run via a class decorator rather than in __init__(), which makes a lot more sense to me.

As for the tests and extra comments/documentation, they've all been remedied in yesterday's commits. Please let me know if there's anything else left outstanding.

@eliben eliben merged commit 14b8a7e into eliben:master Feb 22, 2017
@eliben
Copy link
Owner

eliben commented Feb 22, 2017

Merged, thanks!

@natezb
Copy link
Contributor Author

natezb commented Feb 22, 2017

Awesome, thanks!

@asavah
Copy link

asavah commented Feb 23, 2017

This currently breaks https://github.com/pyca/cryptography build

Traceback (most recent call last):
  File "setup.py", line 335, in <module>
    **keywords_with_side_effects(sys.argv)
  File "/home/asavah/kross/host/lib/python2.7/distutils/core.py", line 111, in setup
    _setup_distribution = dist = klass(attrs)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/setuptools-34.2.0-py2.7.egg/setuptools/dist.py", line 320, in __init__
    _Distribution.__init__(self, attrs)
  File "/home/asavah/kross/host/lib/python2.7/distutils/dist.py", line 287, in __init__
    self.finalize_options()
  File "/home/asavah/kross/host/lib/python2.7/site-packages/setuptools-34.2.0-py2.7.egg/setuptools/dist.py", line 387, in finalize_options
    ep.load()(self, ep.name, value)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/cffi-1.9.1-py2.7-linux-x86_64.egg/cffi/setuptools_ext.py", line 187, in cffi_modules
    add_cffi_module(dist, cffi_module)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/cffi-1.9.1-py2.7-linux-x86_64.egg/cffi/setuptools_ext.py", line 49, in add_cffi_module
    execfile(build_file_name, mod_vars)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/cffi-1.9.1-py2.7-linux-x86_64.egg/cffi/setuptools_ext.py", line 25, in execfile
    exec(code, glob, glob)
  File "src/_cffi_src/build_openssl.py", line 94, in <module>
    extra_link_args=extra_link_args(compiler_type()),
  File "src/_cffi_src/utils.py", line 61, in build_ffi_for_binding
    extra_link_args=extra_link_args,
  File "src/_cffi_src/utils.py", line 70, in build_ffi
    ffi.cdef(cdef_source)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/cffi-1.9.1-py2.7-linux-x86_64.egg/cffi/api.py", line 105, in cdef
    self._cdef(csource, override=override, packed=packed)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/cffi-1.9.1-py2.7-linux-x86_64.egg/cffi/api.py", line 119, in _cdef
    self._parser.parse(csource, override=override, **options)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/cffi-1.9.1-py2.7-linux-x86_64.egg/cffi/cparser.py", line 299, in parse
    self._internal_parse(csource)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/cffi-1.9.1-py2.7-linux-x86_64.egg/cffi/cparser.py", line 304, in _internal_parse
    ast, macros, csource = self._parse(csource)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/cffi-1.9.1-py2.7-linux-x86_64.egg/cffi/cparser.py", line 262, in _parse
    self.convert_pycparser_error(e, csource)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/cffi-1.9.1-py2.7-linux-x86_64.egg/cffi/cparser.py", line 291, in convert_pycparser_error
    raise api.CDefError(msg)
cffi.api.CDefError: cannot parse "typedef int __dotdotdot__  time_t;"
:21:28: before: time_t

Using pycparser at 599a495 and all other related packages versions unchanged works fine.

@natezb
Copy link
Contributor Author

natezb commented Feb 23, 2017

I would bring this up with the cffi people. It seems like they're using a hack for adding special functionality, which relied on pycparser incorrectly allowing multiple types in declaration specifiers. They replace '...' in certain situations with the name '__dotdotdot__', which they've typedef'd somewhere else.

If you try to compile

typedef int __dotdotdot__;
typedef int __dotdotdot__ time_t;

with gcc or clang, both will fail to parse, having interpreted the second __dotdotdot__ as a declarator.

Maybe we could add a minor way for users of pycparser to do this sort of thing, but I'm not sure how Eli feels about this. I do know that cffi uses another hack to work around handling of __stdcall using volatile volatile const which it would be nice to get rid of.

Here's a basic proposal: Add dummy token type called USERQUOTE, which can be any sequence of characters surrounded by backticks (or whatever invalid C we decide). Then we say USERQUOTE is a valid type-qualifier. Now cffi or whoever can insert arbitrary notes to themselves wherever type qualifiers are allowed. I just tested this out and it works with 6 lines of code added.

@eliben
Copy link
Owner

eliben commented Feb 26, 2017

@asavah is the original incoming C code into pycparser incorrect, as @natezb points out? I suggest discussing possible workarounds separately, or opening a more concrete bug against pycparser if it seems like it's parsing something wrongly

@asavah
Copy link

asavah commented Feb 26, 2017

@eliben I don't actually understand all the lower level stuff going oh here.
I stumbled into the issue while building cryptography(git) with pycparser(git), so I decided to report it in the PR which broke the behaviour which was working before, to just let you and possibly others know.
It's up to you to decide whether fix this or not, or report it to cffi people because they are doing something nasty that they are not supposed to from pycparser POV.

Huge thanks for your great work.

@eliben
Copy link
Owner

eliben commented Feb 26, 2017

@asavah fair enough, thanks for reporting!

@natezb
Copy link
Contributor Author

natezb commented Feb 26, 2017

Just to close out this thread, I've moved the discussion to a new cffi issue. Depending how that goes, I may start a new issue/PR on what pycparser can do to help.

@natezb natezb deleted the master branch February 28, 2017 01:30
@jiffe
Copy link

jiffe commented Apr 20, 2017

I just cloned from master and I'm seeing a number of errors and warnings and it looks like they're related to this change

ERROR: c_parser.py:630: Symbol 'id_init_declarator_list_opt' used, but not defined as a token or a rule
ERROR: c_parser.py:709: Symbol 'declaration_specifiers_no_type_opt' used, but not defined as a token or a rule
ERROR: c_parser.py:714: Symbol 'declaration_specifiers_no_type_opt' used, but not defined as a token or a rule
ERROR: c_parser.py:719: Symbol 'declaration_specifiers_no_type_opt' used, but not defined as a token or a rule
WARNING: plyparser.py:42: Rule 'specifier_qualifier_list_opt' defined, but not used
WARNING: plyparser.py:42: Rule 'declaration_specifiers_opt' defined, but not used
WARNING: There are 2 unused rules
WARNING: Symbol 'id_init_declarator' is unreachable
WARNING: Symbol 'id_init_declarator_list' is unreachable
WARNING: Symbol 'specifier_qualifier_list_opt' is unreachable
WARNING: Symbol 'declaration_specifiers_opt' is unreachable
ERROR: Infinite recursion detected for symbol 'direct_declarator'
ERROR: Infinite recursion detected for symbol 'declaration_specifiers_no_type'

jiffe pushed a commit to jiffe/pycparser that referenced this pull request Apr 24, 2017
This reverts commit 14b8a7e.

Conflicts:

	pycparser/plyparser.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants