New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parsing in Header::normalize() #447
Conversation
Simply splitting at commas will result in an incorrect output, as each list items might itself contain a comma within a quoted-string. While ETags are not technically a quoted-string (specifically backslash escapes are not supported) they can serve as a good example: If-None-Match: "foo", "foo,bar", "bar" Refers to three ETags, not four. Simply reuse the existing array implementation that is able to correctly handle quoted-strings without any backslash escapes.
The code did not correctly handle a backslash-escaped quote, misparsing headers such as: private, community="Guzzle\"Psr7"
RFC 7230#7: > For compatibility with legacy list rules, a recipient MUST parse and ignore a > reasonable number of empty list elements: and > Empty elements do not contribute to the count of elements present.
$result = []; | ||
foreach ($header as $value) { | ||
foreach ((array) $header as $value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This first loop is a bit odd, because it appears to allow passing in an array of arrays. It is not useful to parse more than one type of header at once, as the list syntax is not defined for every header. i.e. simply passing in the full ->getHeaders()
array is not valid.
All in all this normalize()
function is not documented super-well, the method name is not really helpful and I don't really trust the parsing logic based on regular expressions.
I suggest that this method is deprecated and replaced by a public static function splitList(array|string $header)
that either accepts a string
or a string[]
, properly validating the input. The parser should be handwritten (simply handling quoted strings is easy), because that regular expression is pretty magic.
I can perform this change, but would like to hear your opinion first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am reluctant to make big changes here. Can we go with whatever the minimum changes are to get your use case to work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current diff as it is in this PR contains the minimally necessary changes to correctly implement parsing of headers containing lists.
My proposed change with the deprecation of the normalize()
method would just be to tighten up the API and to clearly define the method's behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed your comment over in WoltLab/WCF (notifications for that repository go to $companymail):
This PR is not fixing an actual issue with our use case. This PR primarily is a matter of standard's correctness. The only real bug is when a string instead of an array is passed to normalize()
, because then the simply explode()
will be used.
I've also re-added the php-http Slack to my Slack Desktop Client. Feel free to message me there if you feel that it would help.
ad75894
to
8b7f198
Compare
8b7f198
to
36e879f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we also add the date test case from #172
How come you do not trust the regex? Should we add more tests to make sure all headers are parsed properly?
I think we can change the doc block to show the intended input of this method.
No, this is not useful: The https://httpwg.org/specs/rfc7230.html#rfc.section.3.2.2
Using the
With the look-ahead it's a pretty complicated regular expression that is hard to verify for correctness in edge cases. I think it's easier to verify the correctness of a simply character based parser, especially one that only needs to handle a very small number of cases. Of course your mileage might vary.
I've added all the edge cases I could think of as tests.
Expanding the doc block would certainly be helpful. However I also think that That's why I proposed To have something actual to look at, my proposed replacement method would look like this. It passes the same tests: /**
* Splits a HTTP header defined to contain comma-separated list into
* each individual value. Empty values will be removed.
*
* Example headers include 'accept', 'cache-control' and 'if-none-match'.
*
* This method must not be used to parse headers that are not defined as
* a list, such as 'user-agent' or 'set-cookie'.
*
* @param string|string[] $values Header value as returned by MessageInterface::getHeader()
* @return string[]
*/
public static function splitList($values): array
{
if (!\is_array($values)) {
$values = [$values];
}
$result = [];
foreach ($values as $value) {
if (!\is_string($value)) {
throw new TypeError('$header must either be a string or an array containing strings.');
}
$v = '';
$isQuoted = false;
$isEscaped = false;
for ($i = 0, $max = \strlen($value); $i < $max; $i++) {
if ($isEscaped) {
$v .= $value[$i];
$isEscaped = false;
continue;
}
if (!$isQuoted && $value[$i] === ',') {
$v = \trim($v);
if ($v !== '') {
$result[] = $v;
}
$v = '';
continue;
}
if ($isQuoted && $value[$i] === '\\') {
$isEscaped = true;
$v .= $value[$i];
continue;
}
if ($value[$i] === '"') {
$isQuoted = !$isQuoted;
$v .= $value[$i];
continue;
}
$v .= $value[$i];
}
$v = \trim($v);
if ($v !== '') {
$result[] = $v;
}
}
return $result;
} |
Friendly ping 😃 Can this get looked at? I don't need this fix in a release immediately. I'd like to see this PR merged so that I can rely on this arriving in a release eventually, allowing me to build on top of it. Summarizing:
I'm happy to answer your questions here in this PR, or in PHP HTTP Slack. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 2 weeks if no further activity occurs. Thank you for your contributions. |
This still is an issue and I'd still like to see this merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sure is complex. Sorry for the delay.
I do agree with the tests and I am happy with the changes.
@GrahamCampbell added an additional test in #476. You will get the credit, for this work. Thank you |
That works for me, thank you. I've created a PR with my suggested cleaned up replacement API: #477. It's not as critical, because at least Thanks! |
No description provided.