Fix parsing TYPEIDs in declarators #169

natezb · 2017-02-15T23:18:50Z

This fixes the parsing of TYPEIDs in declarators (and related expressions) once and for all, removing existing workarounds for specific cases of the problem. In particular, it solves the problem in the current parser where a TYPEID is used in a list of multiple declarators, for which there is no workaround.

All tests are passing for me, and I added 2 more tests related to parsing declarators correctly.

I've tried to organize the commits as logically and discretely as possible to make it clear what is happening in each step:

Remove workaround productions
Allow TYPEIDs in declarators
Restrict declaration-specifiers and specifier-qualifier-list to contain at least one type-specifier, and only one if it is a typedef-name
Force parameter-declarations to interpret a TYPEID as a typedef-name in cases of ambiguity

It is likely there is some more leftover "workaround" code that can removed, but I decided it was best to do the PR as-is for now, as that may take some careful thought (and maybe more tests to ensure no change in behavior).

…adding `typeid_declarator`

… least one `type specifier`

…isallow `typeid`s once we've seen a `type_specifier`

…sn't start with a TYPEID

eliben · 2017-02-16T17:36:24Z

Can you provide a higher-level overview of what's being done here?

Also, it would be nice to add more testing since this is a major change.

[FWIW I'm squashing all commits in Pull Requests]

natezb · 2017-02-16T17:55:20Z

Here's an excerpt from my comments on #46:

The essential change is to allow a TYPEID to be part of a declarator. We introduce two categories of declarator: (1) an id-declarator (equivalent to what is currently just a 'declarator'), and (2) a typeid-declarator, which is identical but uses a TYPEID rather than an ID. In most cases, these two get recombined into a generic 'declarator', which can be either of the two. In the special case of a function definition, its declarator must be an id-declarator (this restriction is valid b/c a function name can't have been typedef'd already because such a typedef would have happened in the same outermost scope).

However, allowing TYPEIDs in declarators obviously will introduce some parsing conflicts. To fix these, the other main change is to make a restriction to declaration-specifiers (and specifier-qualifier-list) which reflects two things: (1) it must contain at least one type-specifier, and (2) if it contains a typedef-name, it may not contain any other type-specifiers. This is required so that the parser knows when to switch from parsing a specifiers list to parsing a declarator. It's also consistent with declaration-specifiers as described in section 6.7.2.2 of ISO/IEC 9899:TC3.

After making these changes, you can then remove the existing productions that are workarounds to deal with incorrect parses.

Since I wrote that, there was one other change required, which is to disallow typeid-declarators nested inside parens as part of a parameter declaration. This is because in this situation the TYPEID should always be treated as a typedef-name, per section 6.7.5.3.11 of ISO/IEC 9899:TC3:

If, in a parameter declaration, an identifier can be treated either as a typedef name or as a parameter name, it shall be taken as a typedef name.

I agree it is a big change, and I can provide more tests. I can also add some more detail in the comments if you'd like.

eliben

Thanks for the additional details. I'm inclined to accept this PR after we work through the review.

More tests would certainly be good. At the very least the cases described in http://eli.thegreenplace.net/2011/05/02/the-context-sensitivity-of-cs-grammar-revisited should be tested, and more tests would be welcome too

I also left some code review comments.

eliben · 2017-02-18T14:50:47Z

pycparser/c_parser.py

@@ -347,7 +353,7 @@ def _fix_decl_name_type(self, decl, typename):
                coord=typename[0].coord)
        return decl

-    def _add_declaration_specifier(self, declspec, newspec, kind):
+    def _add_declaration_specifier(self, declspec, newspec, kind, append=False):


Describe the append parameter in the docstring

eliben · 2017-02-18T14:54:02Z

pycparser/c_parser.py

-    def p_declarator_2(self, p):
-        """ declarator  : pointer direct_declarator
+    @parameterized(('id', 'ID'), ('typeid', 'TYPEID'), ('typeid_noparen', 'TYPEID'))
+    def _p_XXX_declarator_1(self, p):


It's unfortunate that these rules both have the decorator and need a special name? Can't the decorator just attach a function attribute that would be recognized?

What do you mean by "special name", the XXX part or the leading underscore? If you're talking about the XXX, I had assumed that the p_ functions had to be named the same as their nonterminal. If that's not the case, I can remove the XXX.

If you're talking about the underscore, it's required so the function doesn't get treated as a rule directly. If you omitted the underscore, you'd have to either rename or delete the template method to keep this from happening, which means you'd need different behavior when first instantiating a CParser.

This arises because I wrote _create_param_rules() in a similar way as create_opt_rule(), which is called during __init__(). What makes more sense to me is to instead use a very simple metaclass so this modification (or creation) of methods only happens at class-creation time, not every time a parser is instantiated. The metaclass's __new__() method simply loops through the class members, modifies them as described, and passes the modified classdict to type().

On second thought, due to Python 2/3 differences, a metaclass may not be the best solution. Instead, a class decorator is probably the way to go.

Ah, OK. That makes sense. I don't think the extra complexity is worth it (having a metaclass or class decorator here), so we can keep it as is.

eliben · 2017-02-18T14:55:48Z

pycparser/plyparser.py

+    def _create_param_rules(self, bound_rule_method):
+        """ Creates ply.yacc rules based on a parameterized bound method. """
+        f = bound_rule_method.__func__
+        for xxx, yyy in f._params:


Needs more comments

natezb · 2017-02-19T18:10:09Z

Thanks for taking the time to review this. I've asked for clarification on one of your comments, and in the mean time I'll start working on implementing more tests and adding better documentation.

natezb · 2017-02-20T01:47:08Z

I'm at a bit of a crossroads, and I'd like your input. All of the test cases in your blog post were already in place, except the last one, which goes like this:

typedef char AA;
void foo()
{
  int aa = sizeof(AA), AA, bb = sizeof(AA);
}

This code parses without choking, but incorrectly treats both AA's in the sizeof(AA)'s as types, where it should treat the second one as an expression. This isn't disastrous, since this code won't even parse before this PR. However, I would like to ultimately get this right, which will involve some changes to the machinery of how declarations are created, since they would have to be generated as each declarator is parsed, rather than all-at-once once in the decl-body production.

I'm thinking the best route would be to submit those changes as a separate PR, since this one is already quite large. Does that sound OK to you, or would you prefer them to be changed all at once?

Again, thanks for taking the time to deal with this!

eliben · 2017-02-20T02:59:07Z

I think it's OK handling this special case in a separate PR. It may take me a couple of days to get back to this review -- I'm not forgetting, just juggling different things ATM.

eliben · 2017-02-22T04:37:13Z

There's a couple remaining review comments - both minor. Please address them and I'll merge

natezb · 2017-02-22T04:46:13Z

Hopefully I didn't get too far out front, but I already added the class decorator approach in d53e36a and subsequently removed the need for underscores in 79100bd. The code's basically the same as before, it's just run via a class decorator rather than in __init__(), which makes a lot more sense to me.

As for the tests and extra comments/documentation, they've all been remedied in yesterday's commits. Please let me know if there's anything else left outstanding.

eliben · 2017-02-22T13:54:18Z

Merged, thanks!

natezb · 2017-02-22T15:59:31Z

Awesome, thanks!

asavah · 2017-02-23T10:11:53Z

This currently breaks https://github.com/pyca/cryptography build

Traceback (most recent call last):
  File "setup.py", line 335, in <module>
    **keywords_with_side_effects(sys.argv)
  File "/home/asavah/kross/host/lib/python2.7/distutils/core.py", line 111, in setup
    _setup_distribution = dist = klass(attrs)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/setuptools-34.2.0-py2.7.egg/setuptools/dist.py", line 320, in __init__
    _Distribution.__init__(self, attrs)
  File "/home/asavah/kross/host/lib/python2.7/distutils/dist.py", line 287, in __init__
    self.finalize_options()
  File "/home/asavah/kross/host/lib/python2.7/site-packages/setuptools-34.2.0-py2.7.egg/setuptools/dist.py", line 387, in finalize_options
    ep.load()(self, ep.name, value)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/cffi-1.9.1-py2.7-linux-x86_64.egg/cffi/setuptools_ext.py", line 187, in cffi_modules
    add_cffi_module(dist, cffi_module)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/cffi-1.9.1-py2.7-linux-x86_64.egg/cffi/setuptools_ext.py", line 49, in add_cffi_module
    execfile(build_file_name, mod_vars)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/cffi-1.9.1-py2.7-linux-x86_64.egg/cffi/setuptools_ext.py", line 25, in execfile
    exec(code, glob, glob)
  File "src/_cffi_src/build_openssl.py", line 94, in <module>
    extra_link_args=extra_link_args(compiler_type()),
  File "src/_cffi_src/utils.py", line 61, in build_ffi_for_binding
    extra_link_args=extra_link_args,
  File "src/_cffi_src/utils.py", line 70, in build_ffi
    ffi.cdef(cdef_source)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/cffi-1.9.1-py2.7-linux-x86_64.egg/cffi/api.py", line 105, in cdef
    self._cdef(csource, override=override, packed=packed)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/cffi-1.9.1-py2.7-linux-x86_64.egg/cffi/api.py", line 119, in _cdef
    self._parser.parse(csource, override=override, **options)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/cffi-1.9.1-py2.7-linux-x86_64.egg/cffi/cparser.py", line 299, in parse
    self._internal_parse(csource)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/cffi-1.9.1-py2.7-linux-x86_64.egg/cffi/cparser.py", line 304, in _internal_parse
    ast, macros, csource = self._parse(csource)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/cffi-1.9.1-py2.7-linux-x86_64.egg/cffi/cparser.py", line 262, in _parse
    self.convert_pycparser_error(e, csource)
  File "/home/asavah/kross/host/lib/python2.7/site-packages/cffi-1.9.1-py2.7-linux-x86_64.egg/cffi/cparser.py", line 291, in convert_pycparser_error
    raise api.CDefError(msg)
cffi.api.CDefError: cannot parse "typedef int __dotdotdot__  time_t;"
:21:28: before: time_t

Using pycparser at 599a495 and all other related packages versions unchanged works fine.

natezb · 2017-02-23T19:03:53Z

I would bring this up with the cffi people. It seems like they're using a hack for adding special functionality, which relied on pycparser incorrectly allowing multiple types in declaration specifiers. They replace '...' in certain situations with the name '__dotdotdot__', which they've typedef'd somewhere else.

If you try to compile

typedef int __dotdotdot__;
typedef int __dotdotdot__ time_t;

with gcc or clang, both will fail to parse, having interpreted the second __dotdotdot__ as a declarator.

Maybe we could add a minor way for users of pycparser to do this sort of thing, but I'm not sure how Eli feels about this. I do know that cffi uses another hack to work around handling of __stdcall using volatile volatile const which it would be nice to get rid of.

Here's a basic proposal: Add dummy token type called USERQUOTE, which can be any sequence of characters surrounded by backticks (or whatever invalid C we decide). Then we say USERQUOTE is a valid type-qualifier. Now cffi or whoever can insert arbitrary notes to themselves wherever type qualifiers are allowed. I just tested this out and it works with 6 lines of code added.

eliben · 2017-02-26T14:07:59Z

@asavah is the original incoming C code into pycparser incorrect, as @natezb points out? I suggest discussing possible workarounds separately, or opening a more concrete bug against pycparser if it seems like it's parsing something wrongly

asavah · 2017-02-26T14:23:33Z

@eliben I don't actually understand all the lower level stuff going oh here.
I stumbled into the issue while building cryptography(git) with pycparser(git), so I decided to report it in the PR which broke the behaviour which was working before, to just let you and possibly others know.
It's up to you to decide whether fix this or not, or report it to cffi people because they are doing something nasty that they are not supposed to from pycparser POV.

Huge thanks for your great work.

eliben · 2017-02-26T14:39:22Z

@asavah fair enough, thanks for reporting!

natezb · 2017-02-26T16:49:30Z

Just to close out this thread, I've moved the discussion to a new cffi issue. Depending how that goes, I may start a new issue/PR on what pycparser can do to help.

jiffe · 2017-04-20T16:54:27Z

I just cloned from master and I'm seeing a number of errors and warnings and it looks like they're related to this change

ERROR: c_parser.py:630: Symbol 'id_init_declarator_list_opt' used, but not defined as a token or a rule
ERROR: c_parser.py:709: Symbol 'declaration_specifiers_no_type_opt' used, but not defined as a token or a rule
ERROR: c_parser.py:714: Symbol 'declaration_specifiers_no_type_opt' used, but not defined as a token or a rule
ERROR: c_parser.py:719: Symbol 'declaration_specifiers_no_type_opt' used, but not defined as a token or a rule
WARNING: plyparser.py:42: Rule 'specifier_qualifier_list_opt' defined, but not used
WARNING: plyparser.py:42: Rule 'declaration_specifiers_opt' defined, but not used
WARNING: There are 2 unused rules
WARNING: Symbol 'id_init_declarator' is unreachable
WARNING: Symbol 'id_init_declarator_list' is unreachable
WARNING: Symbol 'specifier_qualifier_list_opt' is unreachable
WARNING: Symbol 'declaration_specifiers_opt' is unreachable
ERROR: Infinite recursion detected for symbol 'direct_declarator'
ERROR: Infinite recursion detected for symbol 'declaration_specifiers_no_type'

This reverts commit 14b8a7e. Conflicts: pycparser/plyparser.py

natezb added 15 commits February 14, 2017 08:57

Remove init_declarator_list workarounds

e62d172

Remove struct_declaration workaround

7b5cdfb

Remove declarator pointer workaround

f9e62d2

Add @parameterized decorator for parser rules

80033c2

Rename declarator productions to id_declarator in preparation of …

11f3b2c

…adding `typeid_declarator`

Use id_declarator in function definitions

f63c638

Add typeid_declarator and allow it as a declarator

826d5cd

Create separate production for type_specifier_no_typeid

446d533

Allow specifiers to be appended (useful for left-recursive lists)

f807089

Change specifier_qualifier_list to be left-recursive and require at…

9c5aa9a

… least one `type specifier`

Change declaration_specifiers to require one type_specifier and d…

4427907

…isallow `typeid`s once we've seen a `type_specifier`

Allow decl_body to omit a type_specifier if init_declarator doe…

b62fc32

…sn't start with a TYPEID

Add typeid_noparen_declarator for use in parameter_declarations

d85f54d

Add test for multi-declarator declaration using a typedef name

5feb46a

Move test into a more appropriate function and add another test

6d066e9

eliben reviewed Feb 18, 2017

View reviewed changes

natezb added 2 commits February 19, 2017 16:57

Expand UnaryOp in expand_init()

27923aa

Add test for redefining name in the middle of a declaration

9a34838

natezb added 7 commits February 20, 2017 12:26

Added info on the append parameter.

efff8e7

Move rule template processing to a class constructor

d53e36a

Auto-remove template methods and remove leading underscores

79100bd

Use xxx/yyy instead of XXX/YYY for better readability

697f191

Add more documentation of the templating functions

498bb07

Add test for correct handling of ambiguity in parameter declarations

f5c608a

Don't test incremental generation of declarators yet

88eb89c

eliben merged commit 14b8a7e into eliben:master Feb 22, 2017

natezb deleted the master branch February 28, 2017 01:30

jiffe pushed a commit to jiffe/pycparser that referenced this pull request Apr 24, 2017

Revert "Fix parsing TYPEIDs in declarators (eliben#169)"

a97699b

This reverts commit 14b8a7e. Conflicts: pycparser/plyparser.py

eliben mentioned this pull request Jul 11, 2017

Don't fail if docstrings are disabled #198

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix parsing TYPEIDs in declarators #169

Fix parsing TYPEIDs in declarators #169

natezb commented Feb 15, 2017

eliben commented Feb 16, 2017

natezb commented Feb 16, 2017

eliben left a comment

eliben Feb 18, 2017

eliben Feb 18, 2017

natezb Feb 19, 2017

natezb Feb 19, 2017

eliben Feb 22, 2017

eliben Feb 18, 2017

natezb commented Feb 19, 2017

natezb commented Feb 20, 2017

eliben commented Feb 20, 2017

eliben commented Feb 22, 2017

natezb commented Feb 22, 2017

eliben commented Feb 22, 2017

natezb commented Feb 22, 2017

asavah commented Feb 23, 2017 •

edited

natezb commented Feb 23, 2017

eliben commented Feb 26, 2017

asavah commented Feb 26, 2017

eliben commented Feb 26, 2017

natezb commented Feb 26, 2017

jiffe commented Apr 20, 2017

Fix parsing TYPEIDs in declarators #169

Fix parsing TYPEIDs in declarators #169

Conversation

natezb commented Feb 15, 2017

eliben commented Feb 16, 2017

natezb commented Feb 16, 2017

eliben left a comment

Choose a reason for hiding this comment

eliben Feb 18, 2017

Choose a reason for hiding this comment

eliben Feb 18, 2017

Choose a reason for hiding this comment

natezb Feb 19, 2017

Choose a reason for hiding this comment

natezb Feb 19, 2017

Choose a reason for hiding this comment

eliben Feb 22, 2017

Choose a reason for hiding this comment

eliben Feb 18, 2017

Choose a reason for hiding this comment

natezb commented Feb 19, 2017

natezb commented Feb 20, 2017

eliben commented Feb 20, 2017

eliben commented Feb 22, 2017

natezb commented Feb 22, 2017

eliben commented Feb 22, 2017

natezb commented Feb 22, 2017

asavah commented Feb 23, 2017 • edited

natezb commented Feb 23, 2017

eliben commented Feb 26, 2017

asavah commented Feb 26, 2017

eliben commented Feb 26, 2017

natezb commented Feb 26, 2017

jiffe commented Apr 20, 2017

asavah commented Feb 23, 2017 •

edited