Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The default value of NodeReturnType in XPathAttribute is InnerText #507

Closed
rwecho opened this issue Jul 20, 2023 · 8 comments
Closed

The default value of NodeReturnType in XPathAttribute is InnerText #507

rwecho opened this issue Jul 20, 2023 · 8 comments
Assignees

Comments

@rwecho
Copy link
Contributor

rwecho commented Jul 20, 2023

After the PR #506 merged, I used 1.11.50 in my project.

Now there is a question of reality, We must specify the NodeReturnType with InnerHtml or OuterHtml to the nested node.

Because in the previous version the code we use the innerHtml for Encapsulation:

innerHtmlDocument.LoadHtml(htmlNode.InnerHtml);

Now we use the innerText as default for Encapsulation:

NodeReturnType = ReturnType.InnerText;

In other words, it can lead exception to upgrade the version to 1.11.50, if we use like the below code :

    [HasXPath]
    public class InnerHtml
    {
        [XPath("//div")]
        public List<Link> Links { get; set; } = new List<Link>();

        [HasXPath]
        public class Link
        {
            [XPath("//a", "href")]
            public string Href { get; set; }
        }
    }

        var html = @"
<div>
<a href='1.html' >1</a>
</div>

      //...
      var innerHtml = document.DocumentNode.GetEncapsulatedData<InnerHtml>();
";

It will throw not found error because the Links find in the InnerText. If want the code running success , we must add the [XPath("//div", ReturnType.InnerHtml)]

@elgonzo
@JonathanMagnan

I'm sorry to bring this up so late. Please assess whether this is a hidden danger and whether it may have an impact on some important projects

In my opinion, Explicit declaration is maybe a correct things.

@elgonzo
Copy link
Contributor

elgonzo commented Jul 20, 2023

EDIT: I forgot the modification necessary for the XPathAttribute(string xpathString) constructor in my code below. My apologies. Included now with this edit.


Yeah, that's a nice catch and a very good question.

If backwards compatibility here is important, i have an idea that both maintains backwards compatibility while also enabling proper usage of the NodeReturnType property.

The foundation underlying my idea is to make XPathAttribute keeping state about whether NodeReturnType has been explicitly set or not. For tracking whether an XPathAttribute instance or attribute annotation sets the NodeReturnType explicitly, XPathAttribute will use a boolean property/field indicating exactly that. I name it IsNodeReturnTypeExplicitlySet. It should be an internal property/field so that the HAP library itself can query it but it otherwise remains invisible to user code consuming the HAP library.

Something like this (abridged version of the suggested XPathAttribute implementation):

public sealed class XPathAttribute : Attribute
{
    ...

    internal bool IsNodeReturnTypeExplicitlySet { get; private set; }
    
    public ReturnType NodeReturnType
    {
        get => _nodeReturnType;
        set
        {
            _nodeReturnType = value;
            IsNodeReturnTypeExplicitlySet = true;
        }
    }
    private ReturnType _nodeReturnType;
    
    ...
    
    public XPathAttribute(string xpathString)
    {
        XPath = xpathString;
        //
        // Setting the backing field and NOT the NodeReturnType property itself
        // to avoid IsNodeReturnTypeExplicitlySet becoming true for this constructor.
        //
        _nodeReturnType = ReturnType.InnerText;
    }    
    
    public XPathAttribute(string xpathString, ReturnType nodeReturnType)
    {
        XPath = xpathString;
        //
        // Assigning the NodeReturnType property will
        // set the IsNodeReturnTypeExplicitlySet to true.
        //
        // Alternatively, to maintain similarity with the other constructor, the backing field
        // could be set plus an asssignment of IsNodeReturnTypeExplicitlySet like:
        //    _nodeReturnType = nodeReturnType;
        //    IsNodeReturnTypeExplicitlySet = true;
        //
        NodeReturnType = nodeReturnType;
        
    }

This is done specifically in a way to keep the IsNodeReturnTypeExplicitlySet flag operable even when a user only updates HAP without recompiling their user code/classes that make use of the [XPath] attribute.

The second step necessary is a modification of those call sites for Tools.GetHtmlForEncapsulation that in the previous HAP codebase didn't account for the NodeReturnType. Those call sites should only pass NodeReturnType values that are explicitly specified in an [XPath] attribute notation/instantiation, otherwise they should pass whatever would match the old code behavior, e.g., something like this (i am basing this on your PR #506):

innerHtmlDocument.LoadHtml(Tools.GetHtmlForEncapsulation(htmlNode, xPathAttribute.NodeReturnType));

would change into something like this:

    innerHtmlDocument.LoadHtml(
        Tools.GetHtmlForEncapsulation(
            htmlNode,
            (xPathAttribute.IsNodeReturnTypeExplicitlySet) ? xPathAttribute.NodeReturnType : ReturnType.InnerHtml
        )
    );

and

innerHtmlDocument.LoadHtml(Tools.GetHtmlForEncapsulation(node, xPathAttribute.NodeReturnType));

would in the same manner change into something like this:

    innerHtmlDocument.LoadHtml(
        Tools.GetHtmlForEncapsulation(
            node,
            (xPathAttribute.IsNodeReturnTypeExplicitlySet) ? xPathAttribute.NodeReturnType : ReturnType.InnerHtml
        )
    );

What do you think about this approach?

@elgonzo
Copy link
Contributor

elgonzo commented Jul 20, 2023

As an addendum, i pondered a while whether it is worthwhile to keep backward compatibility here, even though i gave an idea about how to maintain it. The old behavior from the old codebase is quite a bit inconsistent, and inconsistent behavior plus adding something more is at the end of the day still inconsistent behavior, just with some topping on it. :-P And trying to understand (or explain to someone) these inconsistencies in the GetEncapsulatedData behavior when it is used in somewhat complex scenarios could become a bit of a challenge, i fear.

Backwards compatibility is a rather strong argument, and usually i gravitate towards maintaining backwards compatibility unless there is a good reason not to. But i also feel the inconsistencies in the old GetEncapsulatedData implementation (including the related attributes such as [XPath]) related to the NodeReturnType handling are such a good reason to not keep this particular case of backwards compatibility.

Of course, not maintaining backwards compatibility and a breaking change in behavior of an API would only be warranted for a major version (first version number) or second version number increment in my opinion. So, there is that to consider too.

Just my 2c, of course. No strings attached, i am not a stakeholder in HAP, so feel free without worries to completely dismiss my expressed opinion if you disagree. :-)

@rwecho
Copy link
Contributor Author

rwecho commented Jul 21, 2023

@elgonzo I'm very glad that you could come and discuss this issue with me.
If we keep the backward compatibility, we can also modify the code :

innerHtmlDocument.LoadHtml(Tools.GetHtmlForEncapsulation(node, xPathAttribute.NodeReturnType==ReturnType.InnerText ? ReturnType.InnerHtml: xPathAttribute.NodeReturnType)); 

Because in the context of the code, if has checked the HasXPathAttribute

if (targetType.IsDefinedAttribute(typeof(HasXPathAttribute)) == true) // Object has xpath attribute (Defined HasXPath)

To the LoadHtml method, It should want an HTML instead of text (Are there any special cases that I haven't thought of?).

After all, this is a very old project, and I believe that maintaining compatibility is of higher priority. Additionally, we should also listen to @JonathanMagnan's advice.

@elgonzo
Copy link
Contributor

elgonzo commented Jul 21, 2023

@elgonzo I'm very glad that you could come and discuss this issue with me. If we keep the backward compatibility, we can also modify the code :

innerHtmlDocument.LoadHtml(Tools.GetHtmlForEncapsulation(node, xPathAttribute.NodeReturnType==ReturnType.InnerText ? ReturnType.InnerHtml: xPathAttribute.NodeReturnType)); 

I don't think this is a good behavior for a public API. Here's why:

If a user defines an annotation like [XPath("//whatever", NodeReturnType = ReturnType.InnerText)], then the behavior of GetEncapsulatedData should be as mandated by this attribute annotation. Which means, it should honor the explicitly specified ReturnType.InnerText. If the GetEncapsulatedData implementation cannot honor a a specified NodeReturnType value for technical reasons or the GetEncapsulatedData API is deemed to not support ReturnType.InnerText for XPathAttribute annotations of properties that are of a type that is also [HasXPath]-annotated, then it should produce some exception/error in that regard.

In my opinion, the GetEncapsulatedData implementation should not try to be smarter than the user and sneakily override a user-specified NodeReturnType, as this will make declaring a NodeReturnType value in the XPathAttribute annotation ambiguous, thus making it more difficult to read code with XPathAttribute annotations using NodeReturnType values correctly. Correct understanding of such user code would then essentially put a burden on the user (or reviewer) about having to understand the additional context or conditions regarding when which NodeReturnType values would mean exactly what in reality. Given the sparse documentation surrounding GetEncapsulatedData, i would be very hesitant to make its behavior and that of its related attributes "clever", fuzzy and less obvious.


Additionally, we should also listen to @JonathanMagnan's advice.

Oh, i fully agree with you. In the end it's zzzproject / Jonathan's decision how to proceed as they have to maintain the library and deal with user feedback etc.


(A pointless side note :-) Yes, HAP is a very old library, dating back to 2005 or so. But the GetEncapsulatedData API is a baby in comparison, added to HAP "only" in 2018. Still five years, though...)

@rwecho
Copy link
Contributor Author

rwecho commented Jul 25, 2023

@elgonzo As you say, the code should not guess the user purpose, I take your advice and create a PR #509 .
And more , I add a default constructor, to make it possible to get some html from some properties. After all, some front-end writing methods are too flexible these days.

public XPathAttribute(string xpathString, string attributeName, ReturnType nodeReturnType)

@elgonzo @JonathanMagnan We can assess whether this pull request is appropriate

@rwecho
Copy link
Contributor Author

rwecho commented Jul 25, 2023

@elgonzo

resultCastedToTargetPropertyType = Convert.ChangeType(result, propertyInfo.PropertyType);

In the code, it can handle the conversion of DateTime type, but it is unable to convert DateTime? correctly. I don't know how you handle this. In fact, this is just a feature and does not affect the functionality of the code.

Currently, my project has enabled Nullable globally, but it appears that HAP does not adapt well to this configuration.

@elgonzo
Copy link
Contributor

elgonzo commented Jul 25, 2023

resultCastedToTargetPropertyType = Convert.ChangeType(result, propertyInfo.PropertyType);

In the code, it can handle the conversion of DateTime type, but it is unable to convert DateTime? correctly. I don't know how you handle this. In fact, this is just a feature and does not affect the functionality of the code.

Currently, my project has enabled Nullable globally, but it appears that HAP does not adapt well to this configuration.

That's a valid and real issue, but unrelated to NodeReturnType. Best to move this into its own issue report (and which i just did: #510)

@JonathanMagnan
Copy link
Member

Hello @rwecho ,

I will close this issue has the v1.11.51 has been released with your latest fix.

Thank again for your contribution ;)

Best Regards,

Jon

@JonathanMagnan JonathanMagnan self-assigned this Aug 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants