Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Enable nullable reference types #498

Open
Drake53 opened this issue May 15, 2023 · 13 comments
Open

[Feature request] Enable nullable reference types #498

Drake53 opened this issue May 15, 2023 · 13 comments
Assignees

Comments

@Drake53
Copy link

Drake53 commented May 15, 2023

1. Description

Nullable reference types (NRT) is a C# 8.0 language feature that makes it easier to prevent and debug NullReferenceExceptions.

2. Exception

Due to the lack of NRT, you currently don't get a compiler warning when using something like HtmlNode.SelectNodes(). Then when you try to use it in a foreach without a null check, you get a NullReferenceException.

var nodes = htmlNode.SelectNodes("");
foreach (var node in nodes) // exception here if no node matched
{
    // do something
}
@JonathanMagnan
Copy link
Member

Hello @Drake53 ,

It's already possible to fix this issue by turning off the BackwardCompatibility option such as

var doc = new HtmlDocument() {BackwardCompatibility = false};

I will discuss with my developer if we should release a version 2.0 of the library with only this issue fixed. This one is so big and will break so many codes and applications that we never feel comfortable to directly fix it.

Best Regards,

Jon

@zgabi
Copy link
Contributor

zgabi commented May 16, 2023

I think you should get rid of all this backward compatibility things and clean up your code by removing them.
If someone is too lazy to change the breaking changes, then probably too lazy to update the package, too,
So I vote for make a version 2.0 with all compatibility things removed. (Not only adding the NRT-s)

@Drake53
Copy link
Author

Drake53 commented May 16, 2023

Thank you for the response @JonathanMagnan, I was not aware of the BackwardCompatibility field. However I think there may be a misunderstanding, since this does not address my issue which prompted me to create this feature request.

I tested with the following code, but nodes is still null:

var doc = new HtmlAgilityPack.HtmlDocument();
doc.BackwardCompatibility = false;
doc.LoadHtml("<html><body><div>Hello world</div></body></html>");
var nodes = doc.DocumentNode.SelectNodes("img");

NRT is a compile-time feature which helps developers prevent NullReferenceExceptions, but for it to work, libraries such as this need to annotate their code properly. In the above example, the method currently looks like this:
https://github.com/zzzprojects/html-agility-pack/blob/master/src/HtmlAgilityPack.Shared/HtmlNode.Xpath.cs#L40

public HtmlNodeCollection SelectNodes(string xpath)

However, if no node matches, null is returned: https://github.com/zzzprojects/html-agility-pack/blob/master/src/HtmlAgilityPack.Shared/HtmlNode.Xpath.cs#L54
IMO it makes more sense to return an empty list, but that's a separate issue.

With NRT, the method should be updated to indicate that the return value may be null:

public HtmlNodeCollection? SelectNodes(string xpath)

The compiler will then warn me if I try to use the result directly (for example in a foreach loop), prompting me to add a null check first.

Edit: Looking at the source code, I realise that you probably meant OptionEmptyCollection instead of BackwardCompatibility, when I set that to true I no longer get a NullReferenceException in the example code.

@JonathanMagnan
Copy link
Member

Hello @Drake53 ,

Indeed I misunderstood the issue. I'm a little bit too used to the issue of not returning an empty collection.

After some discussion, we choose not to implement it.

The best solution would be for the library to return an empty list. We are still discussing if we really want to release a v2.x with this fix.

The second best solution would be yours. Implementing NRT is a very good suggestion and will surely give at least some hint to many people, but unfortunately, this kind of change for a library as much used as Html Agility Pack can also cause some issues for other developers. From our history with this library, whenever we do this kind of change, that always fires back to us, and we need to roll back as we broke someone else code.

I will keep this issue open until we make our decision.

@Drake53
Copy link
Author

Drake53 commented May 23, 2023

Yeah I can imagine some people getting upset if the compiler suddenly gives them a NRT warning on in this example SelectNodes, even though they have set OptionEmptyCollection to true so it never returns null.
Enabling NRT shouldn't break someone's code though, unless they have set the severity of NRT diagnostics to error instead of the default warning.

@elgonzo
Copy link
Contributor

elgonzo commented May 26, 2023

@JonathanMagnan

TL;DR: Regarding the concern expressed in your comment, the "second best solution" is in reality a far better solution than the "best solution".

i do not agree with your argument. Here is why:

As a foundation i will take your concern of "From our history with this library, whenever we do this kind of change, that always fires back to us, and we need to roll back as we broke someone else code." This is a sensible concern to have and which i wholeheartedly agree with.

However, the rest of your argument, the choice of "best" and "second best" solution do completely go against that concern.

  1. "Best solution": Returning an empty collection instead of null.

    This changes the behavior of this API and therefore poses the exact risk you are concerned about: breaking someone's existing code that relies on the method returning null when no matching nodes are found.

  2. "Second best solution": Using nullabilty annotations for reference types.

    Nullability annotations for reference types do not change the behavior nor the real signature of this API. (They are merely metadata giving hints/information for static code analysis done by the compiler.) Therefore, there is little risk breaking someone's existing code, simply because neither the signature nor the behavior of the method changes.

Considering this, it should become apparent that the "second best solution" is actually a better solution than the "best solution".


@Drake53

Enabling NRT shouldn't break someone's code though, unless they have set the severity of NRT diagnostics to error instead of the default warning.

That's not much of practical consequence and a false premise, though. A (legacy) code project that doesn't use NRT will emit errors/warnings when NRT diagnostics are being enabled regardless of whether that project uses HAP or not. Doing string s = null; will emit a NRT diagnostics error/warning when NRT's are enabled. A possible HAP version utilizing NRT annotations will not make the situation worse.

And if you use NRT diagnostics and blindy relying on the current HAP's SelectNodes method not being able to return null because the return type is not NRT-annotated and therefore you forgo a "manual" null-check, well, i guess the code is not broken then, because no NRT diagnostics errors, right? And then only the act of switching NRT diagnostics to errors will break the code, right? ;-) (I know, i know; like these countless confirmation messageboxes have conditioned us to click blindly on OK on any and every messagebox dialog popping up, NRT diagnostics will condition us to mindlessly sprinkle our code with exclamation marks to make the code analysis shut up, lol...)

@JonathanMagnan
Copy link
Member

Thank you @elgonzo for your insight,

I will surely think about it and reconsider my position over the weekend.

@CodingOctocat
Copy link

Vote +1.

@JonathanMagnan JonathanMagnan self-assigned this Nov 30, 2023
@bdovaz
Copy link

bdovaz commented Dec 8, 2023

@JonathanMagnan what I don't understand is that you talk about breaking code or breaking changes.

NRT the only thing it does is that if I use this library and in my csproj I have set <Nullable>enable</Nullable> I will start to get warnings in cases where I do not control the references that may come null, nothing else:

https://learn.microsoft.com/en-us/dotnet/csharp/nullable-references

If you are willing to implement it I offer to create a PR and implement it.

@bdovaz
Copy link

bdovaz commented Dec 8, 2023

Another thing I don't understand is why so many tremendously obsolete targets such as .NET Framework 4.0, .NET Standard 1.3 / 1.6 (instead of 2.0), Portable Class Library (Windows Phone, really?) or UAP are supported today.

Nowadays it is common to target .NET Standard 2.0 and .NET 6+.

@JonathanMagnan
Copy link
Member

Hello @bdovaz ,

Nowadays it is common to target .NET Standard 2.0 and .NET 6+.

I agree, but we try to be backward compatible. Why do I agree that .NET Standard 1.3 and 1.6 are pretty much obsolete, I don't think dropping .NET Framework 4.0 will be a good thing. A lot of people still use HAP in their legacy projects, and it normally just takes us a few more minutes to deploy, so it's not a big deal either. But yeah, we have too many target hehe

If you are willing to implement it I offer to create a PR and implement it.

Sure go ahead. I cannot guarantee it will be merged but we will sure really consider it.

From what I understand, @elgonzo, you would also HAP go ahead with this solution, right?

Best Regards,

Jon

@elgonzo
Copy link
Contributor

elgonzo commented Dec 13, 2023

I don't think dropping .NET Framework 4.0 will be a good thing. A lot of people still use HAP in their legacy projects [...]
From what I understand, @elgonzo, you would also HAP go ahead with this solution, right?

I tend to agree with going forward with this, for the 2 cents my opinion is worth...

However, i have no experience with compiling source code featuring nullable reference types (which is a C# 8 language feature) for very old build targets like .NET Framework 4.0, and cannot really make a blind yet reasonable assumption about the expected quirks and edge cases in the build process such a build configuration might exhibit.

In general, if support for rather old framework targets (such as those not yet supporting .NET Standard, like for example the mentioned .NET Framework 4.0) prevents you from including useful features in HAP, then it is in my opinion time to ponder whether support for these old framework targets actually still outweighs useful features. I am not discussing here about the costs/efforts required for introducing new features/stuff to the library code. Costs vs benefits is an important consideration for the HAP authors/maintainers, but it's not what my ramblings here are about.

.NET Framework 4.0 went out of support more than seven years ago. In my opinion, devs that still need to maintain legacy projects that require a framework/runtime environment that is out-of-support for many years already should be more than content in also using some old-ish legacy version of HAP (or other libraries) for these old legacy projects. Personally, an argument like a project that is being stuck on .NET Framework 4.0 (or older) and that is not able to migrate to for example 4.7.2 (the earliest .NET framework version fully supporting .NET Standard 2.0 without issues) but at the same time somehow apparently requiring the newest and shiniest version of HAP from the year 2023 doesn't make much sense to me. What i am trying to say is that i wouldn't put too much importance and weight on HAP build targets for rather old frameworks that are out-of-support for more than a couple years already. But then again, this is just my opinion here, so feel free take from it whatever you want... or not... :-)

@MihaMarkic
Copy link

MihaMarkic commented Feb 16, 2024

Even then, targeting .net 4 and .netstandard 2.0 while using nullable reference types at same time might still work. As NRTs are design time sugar. NRTs would be simply ignored on older targets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

7 participants