Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add column support in c_parser #178

Merged
merged 1 commit into from Mar 10, 2017
Merged

Conversation

serpilliere
Copy link
Contributor

Hi!

This PR adds the support of coord.column in the ast parser.

It's inspired from the #92, by @bbejot. The main difference is that it only
fixes the missing column: it doesn't not add the end_lineno nor the
end_column.

Note: As the _coord function is not really used any more (only two calls remaining),
tell me if you prefer removing it in favour of ._token_coord.

Copy link
Owner

@eliben eliben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

A general question: did you measure the performance impact of this change?

@@ -51,6 +51,13 @@ def _coord(self, lineno, column=None):
line=lineno,
column=column)

def _token_coord(self, p, token_idx):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function could use a docstring, as well as comments inside

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! I will add it.

self.assertEqual(node.coord.line, line)
if column:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if column == 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, the column implemented keeps the semantic of lex, so the column always starts at 1, not 0.
Do you want me to test against None (plus an assert against 0)?

By the way, do you agree to keep the semantic of lex (starting column AND lineno at 1) or do you want a modification to starts column at 0?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a default value of None and a test against None should be better, I think

Starting column at 1 is fine

self.assert_coord(f1.ext[0], 2, 'test.c')
self.assert_coord(f1.ext[1], 3, 'test.c')
self.assert_coord(f1.ext[2], 6, 'test.c')
self.assert_coord(f1.ext[0], 2, 13, 'test.c')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not very intuitive that this is column 13... How about wrapping parse to strip leading whitespace as long as the first line or something like that, to get a more expectedoffset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact, as the column starts at 1, there is no spaces counted in tokens. the 13 starts on the a:

  TypeDecl: a, []
    IdentifierType: ['int']

Again, if you don't agree with the lex column start, don't hesitate to tell me and I will modify the behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used an (behemoth.h from https://github.com/snare/ida-efiutils/blob/master/behemoth.h) input file of 21945 lines (preprocessed). cProfile For the c_parse.py:parse:

without patch: 1.774s/1.784s/1.801s
with patch: 1.847s/1.857s/1.891s

So about 4% of speed lost

As the function is often called in this case (about 31000), I suspect a little overhead of cprofile.

Raw value on the whole python script:
without patch: 1.997s/2.030s/2.093s
with patch: 2.040s/2.053s/2.090s

(Note the max time of both runs: maybe the timing measures are not really correct here)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant all the leading space, not 0-1 starting. Without columns it wasn't bothersome but now it's hard to see why this should be 13 and not some close-to-that number because everything is indented. To keep this patch small, I'm OK with fixing this somehow in a follow-up

@serpilliere
Copy link
Contributor Author

  • add is None for column
  • comment for _token_coord

I also modified the regression tests in which the file name was not present to comply with the fixed assert_coord api.

@eliben eliben merged commit 471442f into eliben:master Mar 10, 2017
@eliben
Copy link
Owner

eliben commented Mar 10, 2017

Thanks, looks good

@serpilliere serpilliere deleted the Fix_coord branch March 10, 2017 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants