Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hex chars greater than \x7f aborts silently the parsing #550

Open
nicolasrod opened this issue Nov 24, 2018 · 8 comments
Open

Hex chars greater than \x7f aborts silently the parsing #550

nicolasrod opened this issue Nov 24, 2018 · 8 comments

Comments

@nicolasrod
Copy link

Hi. I ran into an issue regarding hex chars in a double quoted string. If I have a piece of code like the following:

<?php 
$a = "\x6f";

I get as a result the following:

[{"nodeType":"Stmt_Expression","expr":{"nodeType":"Expr_Assign","var":{"nodeType":"Expr_Variable","name":"a","attributes":{"startLine":2,"endLine":2}},"expr":{"nodeType":"Scalar_String","value":"o","attributes":{"startLine":2,"endLine":2,"kind":2}},"attributes":{"startLine":2,"endLine":2}},"attributes":{"startLine":2,"endLine":2}}]

But if the variable hold a value greater than \x7f, I get an empty array as a result and no error.
Any ideas?
Thank you!

@nicolasrod nicolasrod changed the title Hex chars > \x7f aborts silently the parsing Hex chars greater than \x7f aborts silently the parsing Nov 24, 2018
@nikic
Copy link
Owner

nikic commented Nov 24, 2018

The problem here is probably in the JSON encoding. JSON only allows valid UTF-8 in strings, and \x7f is not a valid UTF-8 sequence.

@performantdata
Copy link

@nikic I don't understand your answer here. A string in PHP is an array of bytes, so any valid byte values are allowed. The problem is that you're representing it as a string in JSON, instead of as an array of numbers.

@tiyeuse
Copy link

tiyeuse commented May 24, 2019

Any update about this issue ?

@nikic
Copy link
Owner

nikic commented May 24, 2019

Nope. Any suggestions on what to do about this?

@zhaoyanliang2
Copy link

zhaoyanliang2 commented May 24, 2019

Before converting ast to json, iterate through all nodes and encode the variable containing the illegal utf-8 string using base64_encode.

@performantdata
Copy link

Any suggestions on what to do about this?

The problem is that you're representing it as a string in JSON, instead of as an array of numbers.

So represent it as that. A PHP string is not an array of Unicode characters, it's just an array of bytes.

This nature of the string type explains why there is no separate “byte” type in PHP – strings take this role.

So stop trying to convert an arbitrary sequence of bytes into UTF-8.

@nikic
Copy link
Owner

nikic commented May 24, 2019

Before converting ast to json, iterate through all nodes and encode the variable containing the illegal utf-8 string using base64_encode.

That sounds reasonable. We can add two extra visitors for encoding/decoding all strings in base64. It's unfortunate that this is necessary, but don't really see a way around.

@tiyeuse
Copy link

tiyeuse commented Jul 23, 2019

Bump on this error 😃
Will a fix be deployed ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants