New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #312, #317, Response\*Visitor rewrite #406
Fix #312, #317, Response\*Visitor rewrite #406
Conversation
Mike, I'm having trouble understanding some decisions in OperationResponseParser. For some reason, the responsibility of scanning additional properties is transferred to the parser, while it should be a responsibility of a Visitor. For example, in the revised Currently the ::visitAdditionalProperties() method is responsible of poking visitors and forcing them to visit individual "members of array" that would come from In the newest ps: |
I agree that it looks like there's a lot of code duplication. I'm open to changes as long as they aren't BC changes. The OperationResponseParser does, however, need to know which visitors were triggered in order to call the appropriate before and after methods of each visitor (regardless of if it's a no-op or not).
Yeah, locations are only relevant for top-level properties.
With the current setup, the before() method is expected to create an array response based on each found location. Not all locations do this sort of thing, it's only really found in parts of an entity body (e.g., json, xml). If this were changed (as you are proposing), then this would no longer make sense. However, we will always need to trigger before and after methods for each visited visitor before we actually visit anything. As long as that still happens, then changing things around is fine. Note that you can have defined parameters that all have a location different than the location of an additionalProperties parameter (real parameters with a location of header, while AP has a location of json). |
- prevent visitors from resetting the resulting array (allowing for custom visitors performing different operations on resulting structures). - make visitor processing non-conflicting with other visitors (locations). - make JSON and XML vistors' processing respective structures internally, picking out necessary properties. - move processing of additionalProperties out of OperationResponseParser and into visitors keeping BC.
Ok, I'm almost finished. I'll add NS support for XML... I'll also add some more complex combination tests and XML tests. I'll also work on coverage because I've found there is a lot of functionality that is described in docs but not tested at all in unit tests. I'm off for tonight. Cheers! |
/** | ||
* @param array $json | ||
*/ | ||
public function setJson(array $json) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does setJson and getJson need to be public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, they are used for unit tests and can be handy for add-ons.
For example, you might want to have a visitor that re-visits JSON (or XML) to fetch some additional data off of them. getXml() and getJson() allow such visitors to exist and work on the original document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean, but I'd rather the internals of the visitor not be exposed (e.g. SimpleXMLElement) in case the implementation changes (e.g. moving to XmlReader or something).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n/p.
Subclass for tests ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subclassing sounds fine. You could also try PHPUnit's $this->readAttribute()
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately there's no writeAttribute()
so subclassing it'll have to be (to keep SOC in tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I've been testing this branch on some real-life data and it seems that my current implementation of |
- universal support for visiting top-level array in service description. - support for retrieving top-level array as unnamed or named model property. - support for 'location' definition within 'items'. - support for 'location' definition on model itself. - drop the requirement of using array( 'jsonFlatten' => 1).
I was able to rewrite
|
@mtdowling I've got one thing to crack, usability-wise. Because I've dropped the requirement
Make sense, right ? The case when this doesn't work is when Here's a workaround I'm currently using.. I couldn't use Phew. I hope you're still with me. I don't know how I could make this easier. I really like dropping that Feedback much appreciated ;-) |
You've been busy! I liked |
Have you seen those example models in test cases ? |
I did look at the |
Consider the following two examples. [{"foo":1},{"foo":2},{"foo":3}] // this one is an "array" schema with JSON location that will
// read the above JSON into a model that can be iterated
// with foreach($model as $item) { ... }
'models' => array(
'Foo' => array(
'type' => 'array',
'location' => 'json',
'items' => array(
'type' => 'object'
)
)
) // this one is an "object" schema which uses 2 different
// locations. It will read read the above JSON into a
// model under "result", so one can iterate via
// foreach($model['result'] as $item) { ... }
'models' => array(
'Foo' => array(
'type' => 'object',
'properties' => array(
'result' => array(
'type' => 'array',
'location' => 'json',
'items' => array(
'type' => 'object'
)
),
'code' => array(
'location' => 'statusCode'
)
)
)
) |
The problem is: in example 2, how does the user tell that the json array (put inside $model['result']) is supposed to be an unnamed array ? |
Right. So the first example works, which is great. For the second example, I'm suggesting we go back to adding an Alternatively, what if we just don't support example #2 with the |
Hmm. A few questions then. If example #2 required a path, what would the path be ? |
The second example would be We're still finalizing the last bit of the grammar and I'm still working on the PHP implementation (with a lexer, parser, and AST). It should be released in the next few weeks.
I agree, but I also wouldn't want to confuse people into thinking that the visitor accepts jsonPath expressions (which JmesPath is similar but not 1:1).
I know, and I agree with you :). I don't have a good explanation for the name. |
Ok, here's what I propose: because of the limitations as described in example 2, I will replace This is 100% BC, does not mess with example 1 (being the simplest case) and is easy to explain. We will be then able to merge this one and start work on Final example no. 2: // this one is an "object" schema which uses 2 different
// locations. It will read read the above JSON into a
// model under "result", so one can iterate via
// foreach($model['result'] as $item) { ... }
'models' => array(
'Foo' => array(
'type' => 'object',
'properties' => array(
'result' => array(
'type' => 'array',
'location' => 'json',
'sentAs' => '*',
'items' => array(
'type' => 'object'
)
),
'code' => array(
'location' => 'statusCode'
)
)
)
) |
This looks really great! I've gone through and optimized / cleaned up a few things. Can you take a look at 99d3508?
I have one question though: Why are you setting and checking if |
Looks like the rewritten XmlVisitor is not performing and same parsing as the XmlVisitor in master. I'm not sure yet why, but almost all of the Note: the I believe this is the issue: https://github.com/guzzle/guzzle/blob/response-visitors/src/Guzzle/Service/Command/LocationVisitor/Response/XmlVisitor.php#L96. If the type is |
yes. It was branching because of the difference on how
Isn't that exactly the same thing ?
I believe I know the answer. With the "magical" xml to array conversion through json, on the top level, attributes were shoved under I'll try adding a test case based on what you sent me. Although it seems that you have been cherry-picking commits from this PR, which means I don't have your recent changes and we're "forking apart" from each other... this can lead to strange things. |
Hm ok. I wonder if
It should be, but I think the visiting is recursing incorrectly. When I changed that particular line from iterating children to iterating the node itself, almost everything started working.
Maybe we should emulate the old implementation of placing attributes in
I merged your branch entire onto master and then made a commit. Can you not use that? |
I still don't get that. Throw some code at me. |
Ah, so you skipped github... that's why it's still open and there's no trace of it. Ok. |
Ok. If you get rid of the foreach ($node as $child) {
$result[] = $this->recursiveProcess($items, $child);
}
Is there another way I can get the code to you that would be better? Should I fork your repo and submit a PR to your branch? Let me know what works best for you. |
Line 94... .change: if ($sentAs) { to: if ($items->getSentAs()) {
When working with zend framework we tend to always follow the basic PR flow. In case there's a need to augment a PR, contributors send PRs against the branch of the original author which he quickly merges. After the whole batch of work is done PR is merged as a whole. |
Ok. I'll send a PR to your branch. |
I was thinking about that. We could do that, although merging allows for much "cleaner" means of working with attributes. It's easy to iterate over them, and attribute-heavy xml documents are much easier to work with. But there's the BC problem I acknowledge :-\ Your call... |
Let's go with adding attributes to |
I'll merge with current master (hope it doesn't explode) and do the work. |
btw: did my fix work for you ? |
That appears to have fixed some issues, but many others still remain. For example, it looks like top-level additonalProperties aren't being picked up for some reason. |
Mike, I'd like to have this merged as it fixes a lot of bugs. Tests pass. |
Ok, I'm going through failed tests on aws-php-sdk. |
Ok, I know where the problem lies. Here's a result descr. from aws sdk (exerpt): 'DeleteObjectsOutput' => array(
'type' => 'object',
'additionalProperties' => true,
'properties' => array(
'Deleted' => array(
'type' => 'array',
'location' => 'xml',
'data' => array(
'xmlFlattened' => true,
),
'items' => array(
'type' => 'object',
'properties' => array(
'Key' => array(
'type' => 'string',
),
'VersionId' => array(
'type' => 'string',
),
'DeleteMarker' => array(
'type' => 'boolean',
),
'DeleteMarkerVersionId' => array(
'type' => 'string',
),
),
),
),
// [...] Now that tells me, that we're expecting an xml document with Here's the document that we're testing against: <DeleteResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Deleted>
<Key>sample1.txt</Key>
</Deleted>
<Error>
<Key>sample2.txt</Key>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
</Error>
</DeleteResult> This document does not fit the description - That's probably why you've been using the How would you like to approach this ? |
Here's another problem I've found with aws definitions: 'Errors' => array(
'type' => 'array',
'location' => 'xml',
'sentAs' => 'Error',
'data' => array(
'xmlFlattened' => true,
),
'items' => array(
'type' => 'object',
'sentAs' => 'Error',
'properties' => array(
'Key' => array(
'type' => 'string',
),
'VersionId' => array(
'type' => 'string',
),
'Code' => array(
'type' => 'string',
),
'Message' => array(
'type' => 'string',
),
),
),
), Given the result XML mentioned above, the intention is to have I'm not sure what to make of it. The whole visitor thing is already quite foggy for me, as there are a lot of inconsistencies (i.e. same params meaning something different depending on location/nesting) and code duplication in each visitor. I'm becoming quite confused and annoyed by the whole thing. My main motivation is to-make-it-work™ - and by that I mean response parser and visitors were not able to parse simple remote api's (xml and Json) that I have to use for my real-world™ app - because of issues that are listed on the top of this ticket. But now, deeper into the rabbit hole, I'm having trouble getting out :-) I'm tempted to either dump the project altogether and end up using hand-crafted transformers and hydrators with a decent http lib, or work through the above, or rewrite the whole bloody visitor/processing mess. Any friendly advice greatly appreciated. |
I want to get this merged as well. As you can see though, there are a lot of changes going on, so I don't want to rush it. I apologize for not being able to spend that much time on it. The first example you are referencing is how we handle repeated keys under the same container. In this case, the "DeletedContainer" would be repeated underneath it's parent node. However, we want to represent the "Deleted" nodes in an array the same way if there is one value or many values. Consider this example: <DeleteResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Deleted>
<Key>sample1.txt</Key>
</Deleted>
<Deleted>
<Key>sample2.txt</Key>
</Deleted>
<!-- ... --!>
</DeleteResult> When processed, the containing result will have a "Deleted" key that has an array of associative arrays containing each parsed "Deleted" XML node. I'm not sure about the second example. It looks like there's an error with how we are generating the description, and only one of those sentAs attributes is required. It looks like you've deduced that the "sentAs" on the item is redundant. This seems like a bug in the description more than an issue with the current implementation. I'll be able to look into this early next week.
If you choose to go this route, Guzzle is an HTTP library too (the best IMO).
I like this option :)
You basically have done that at this point. Let's finish this off. I'd definitely like to hear your thoughts on how it could be improved though even after your refactoring. We could chat about it on IRC or something whenever you'd like. |
Let's tackle the first one.
So there's a duality, branching within the parsing logic. Basically:
The thing is, that might or might not be specific to xml. Here's an example in JSON: Specimen 1{
"result": {
"ok" : true
}
} In order to parse it correctly, we have to set Specimen 2{
"result": [{
"ok" : true
},{
"ok" : "somewhat"
}]
} The above does not fit the description of an object, so it would not processed - i.e. empty/null. In order to process it, we'd have to have an array with Following the logic from aws and
The problem I have with that is it's slippery slope. It's "wiggly", "relaxed specification" or rather recommendation. Note how this behavior (pre-PR) is not symmetrical - xml visitor does this "magic", while json visitor won't. While I think about it, what would make sense to me is parameter alternatives. I'm not sure yet how, but it would make sense to define the parameter as being dual in nature - either an array (list) or a single item. Unless I got this all wrong... what do you think ? |
btw: problem with current xml visitor duality - you can't rely on the resulting model... i.e. echo $model->deleted->key; // <-- error in case there are 2 errors
foreach($model->deleted as $del){
echo $del->key; // <-- error in case there is only 1 error
} My gut tells me (model) specifications and api schemas are there for a reason. I'm either expecting an array or an object and the rest of the code follows the spec - nothing unexpected can happen with that. |
Parsing XML into a json like strucure (PHP arrays) requires more configuration options than parsing JSON into a PHP array. For example, this XML example, the result from parsing should give the same structure regardless of if there is one node or multiple nodes. We have flattened XML attributes because this is something that can happen in XML responses. It can't happen in json, so there's no need to have any json flattening. This is because JSON has arrays and objects, while XML can have many permutations: repeated nodes with the same name and no container, ditto and a container, attributes, etc. The whole goal of parsing responses is to represent the data received over the wire in a PHP assoc array that will provide consistent results of translating the over the wire format to the array. This is more difficult with XML, hence the extra data attributes. I don't consider this magic in the XML visitor, but simply a necessity of parsing XML into an associative array.
|
That's not entirely true. It's a form of a processing pipe. You push XML/JSON from one side, a model comes out of the other side. Input formats differ in semantics, grammar and capabilities. Key differences here are:
The first difference is solvable by paths and scoping (this PR adds support for namespace scoping). The second difference is moot in my opinion. It's a characteristic of the input format. It's not even a real feature - xml just repeats nodes and doesn't call that a list/array, but a node with child nodes. It's basically the same thing. So the first step is reading, ingesting the data in particular format. That's least important as it's just Guzzle's implementation detail. What is important here is what comes out of Guzzle. The example above with json exchanging arrays and objects is not without a merit. You could have the very same case with JSON, where you could get a response in form of an array or a single object. The grammar differs, the principle doesn't - it's 1 or more objects. I have a problem with the output not being symmetrical, not the input. The input doesn't matter. Many APIs allow you to switch data formats between XML/SOAP/JSON/array etc. Because Guzzle supplies api model abstraction layer I'm expecting Guzzle to take care of interpreting the input and giving me the same (predictable) output every single time... regardless of headers, chunking, ssl, revalidation, caching, data format, text encoding format et al. Symmetry argument is very important in terms of using models and consuming incoming model. As mentioned in the example above, if my userland code can't rely on the incoming model from guzzle, then what's the point of using it. I could just load the XML and |
I appreciate you taking the time to convey your thoughts on this.
For better or worse, as of right now, Guzzle's service descriptions are closely tied to the serialization format used over the wire (e.g., xml, json, etc). This is evident in the special care that needs to be taken to parse XML responses into the normalized array like syntax of Guzzle's response models. The current service description format was not designed to work across different serialization formats for the same API. The reason for this is because the end user of a web service client doesn't care about the over the wire format of the webservice because it is abstracted away by the service description. Service descriptions are written with a specific serialization format in mind from the start to work around this problem entirely. This is something that we can definitely look at for a future version of Guzzle (maybe Guzzle service descriptions become versioned or something).
That is incorrect. One of the purposes of the $subsetOftheReallyBigServiceDescription = array(
'DeleteObjectsOutput' => array(
'type' => 'object',
'additionalProperties' => true,
'properties' => array(
'Deleted' => array(
'type' => 'array',
'location' => 'xml',
'data' => array(
'xmlFlattened' => true,
),
'items' => array(
'type' => 'object',
'properties' => array(
'Key' => array(
'type' => 'string',
),
'VersionId' => array(
'type' => 'string',
),
'DeleteMarker' => array(
'type' => 'boolean',
),
'DeleteMarkerVersionId' => array(
'type' => 'string',
),
),
),
),
'Errors' => array(
'type' => 'array',
'location' => 'xml',
'sentAs' => 'Error',
'data' => array(
'xmlFlattened' => true,
),
'items' => array(
'type' => 'object',
'sentAs' => 'Error',
'properties' => array(
'Key' => array(
'type' => 'string',
),
'VersionId' => array(
'type' => 'string',
),
'Code' => array(
'type' => 'string',
),
'Message' => array(
'type' => 'string',
),
),
),
),
'RequestId' => array(
'location' => 'header',
'sentAs' => 'x-amz-request-id',
),
),
)
); Let's say we get this response back from the service: <?xml version="1.0" encoding="UTF-8"?>
<DeleteResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<Deleted>
<Key>sample1.txt</Key>
</Deleted>
<Error>
<Key>sample2.txt</Key>
<Code>AccessDenied</Code>
<Message>Access Denied</Message>
</Error>
</DeleteResult> Regardless of if the XML response returned a single Deleted node or multiple Deleted nodes, the response model will always contain a "Deleted" key that contains an array of objects. The above XML parsed using the service description I linked will create the following response model (represented as JSON just to type it quickly): {
"Deleted": [{"Key": "string", "VersionId": "string", ... }, {...}, ...],
"Errors": [{"Key": "string", "VersionId": "string", ... }, {...}, ...],
} (You can also see the structure in the API docs of the SDK for this method: http://docs.aws.amazon.com/aws-sdk-php/latest/class-Aws.S3.S3Client.html#_deleteObjects). Collections of XML nodes can be represented in various ways while a collection of JSON values can only be represented in an array. Wrapping node (this is not flattened because "things" is the collection of nodes (an array type) and "thing" is a node stored in the collection): <response>
<things>
<thing>...</thing>
<thing>...</thing>
</things>
</response> No wrapping node (flattened): <response>
<thing>...</thing>
<thing>...</thing>
</response> |
Gotcha. So I have to reincarnate |
any news on this? |
The service description layer of Guzzle 4 has been rewritten. Many of the design decisions for the rewrite were taken from this issue. You can find the service description layer for Guzzle 4 at https://github.com/guzzle/guzzle-services. Guzzle 3 development has moved to https://github.com/guzzle/guzzle3. Feel free to reopen this PR on the Guzzle 3 repository (though I feel this is a risky PR due to the amount of changes and considering that 4.0 is now being released). |
👍 ... a lot of hours went into that, but I'm happy that at least it was sorta inspiration for refactor. |
This fixes #312 and allows for traversal of XML documents including nested attributes and namespaces.
SimpleXMLElement
toarray
XMLVisitor
.OperationResponseParser::visitAdditionalProperties()
limitationsadditionalProperties
stances.location
definition.additionalProperties