-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add column support in c_parser #178
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
A general question: did you measure the performance impact of this change?
@@ -51,6 +51,13 @@ def _coord(self, lineno, column=None): | |||
line=lineno, | |||
column=column) | |||
|
|||
def _token_coord(self, p, token_idx): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function could use a docstring, as well as comments inside
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right! I will add it.
tests/test_c_parser.py
Outdated
self.assertEqual(node.coord.line, line) | ||
if column: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if column == 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, the column
implemented keeps the semantic of lex, so the column always starts at 1, not 0.
Do you want me to test against None
(plus an assert against 0)?
By the way, do you agree to keep the semantic of lex (starting column AND lineno at 1) or do you want a modification to starts column at 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a default value of None and a test against None should be better, I think
Starting column at 1 is fine
self.assert_coord(f1.ext[0], 2, 'test.c') | ||
self.assert_coord(f1.ext[1], 3, 'test.c') | ||
self.assert_coord(f1.ext[2], 6, 'test.c') | ||
self.assert_coord(f1.ext[0], 2, 13, 'test.c') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not very intuitive that this is column 13... How about wrapping parse to strip leading whitespace as long as the first line or something like that, to get a more expectedoffset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in fact, as the column
starts at 1, there is no spaces counted in tokens. the 13 starts on the a
:
TypeDecl: a, []
IdentifierType: ['int']
Again, if you don't agree with the lex column start, don't hesitate to tell me and I will modify the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used an (behemoth.h from https://github.com/snare/ida-efiutils/blob/master/behemoth.h) input file of 21945 lines (preprocessed). cProfile For the c_parse.py:parse:
without patch: 1.774s/1.784s/1.801s
with patch: 1.847s/1.857s/1.891s
So about 4% of speed lost
As the function is often called in this case (about 31000), I suspect a little overhead of cprofile.
Raw value on the whole python script:
without patch: 1.997s/2.030s/2.093s
with patch: 2.040s/2.053s/2.090s
(Note the max time of both runs: maybe the timing measures are not really correct here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant all the leading space, not 0-1 starting. Without columns it wasn't bothersome but now it's hard to see why this should be 13 and not some close-to-that number because everything is indented. To keep this patch small, I'm OK with fixing this somehow in a follow-up
I also modified the regression tests in which the file name was not present to comply with the fixed |
Thanks, looks good |
Hi!
This PR adds the support of
coord.column
in the ast parser.It's inspired from the #92, by @bbejot. The main difference is that it only
fixes the missing
column
: it doesn't not add theend_lineno
nor theend_column
.Note: As the
_coord
function is not really used any more (only two calls remaining),tell me if you prefer removing it in favour of
._token_coord
.