Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

override encoding cannot help when receiving not supported context encoding #326

Open
y-code opened this issue Sep 7, 2019 · 2 comments · May be fixed by #327
Open

override encoding cannot help when receiving not supported context encoding #326

y-code opened this issue Sep 7, 2019 · 2 comments · May be fixed by #327
Assignees

Comments

@y-code
Copy link

y-code commented Sep 7, 2019

Description

When I tried to load a page from https://www.jamieoliver.com/ by HtmlWeb.Load method, it failed with an ArgumentException.

It turned out to be because the response headers from the site has content-encoding: identity. As per HTTP RFC 2616, identity is used only in the Accept- Encoding header, and SHOULD NOT be used in the Content-Encoding header., so that it is of course that Encoding class does not support identity.

Therefore, next, I specified Encoding.UTF8 to OverrideEncoding property and called HtmlDocument.Load method. However, it didn't make any change and I got the same ArgumentException.

I expected OverrideEncoding property make HtmlWeb class to ignore the Content-Encoding in the response headers from server and to decode content by specified encoding in OverrideEncoding property, but it was not the case.

While it allows overriding the encoding specified by server when the encoding name is valid, it would be ideal that it also worked when the server specified encoding name is invalid.

Exception

Exception message:
System.ArgumentException : 'identity' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.
Parameter name: name

Stack trace:
   at System.Text.EncodingTable.GetCodePageFromName(String name)
   at System.Text.Encoding.GetEncoding(String name)
   at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) in /Users/yas/Projects/happyfl/html-agility-pack/src/HtmlAgilityPack.Shared/HtmlWeb.cs:line 1680
   at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds) in /Users/yas/Projects/happyfl/html-agility-pack/src/HtmlAgilityPack.Shared/HtmlWeb.cs:line 2068
   at HtmlAgilityPack.HtmlWeb.Load(Uri uri, String method) in /Users/yas/Projects/happyfl/html-agility-pack/src/HtmlAgilityPack.Shared/HtmlWeb.cs:line 1290
   at HtmlAgilityPack.HtmlWeb.Load(Uri uri) in /Users/yas/Projects/happyfl/html-agility-pack/src/HtmlAgilityPack.Shared/HtmlWeb.cs:line 1189
   at HappyFL.Services.WebSeekers.RecipeSeeker.Scan() in /Users/yas/Projects/happyfl/HappyFL/Services/WebSeekers/RecipeSeeker.cs:line 34
   at HappyFL.Services.WebSeekerService.FindRecipes(Uri url, Nullable`1 cancel, Encoding encode) in /Users/yas/Projects/happyfl/HappyFL/Services/WebSeekerService.cs:line 159
   at HappyFL.Test.WebSeekerServiceTest.TestFindRecipe(String url, ExpectedResultForTestFindRecipe expected) in /Users/yas/Projects/happyfl/HappyFLTest/WebSeekerServiceTest.cs:line 167

Project to reproduce issue

https://github.com/y-code/repro-bug-in-html-agility-pack

Further technical details

  • HAP version: 1.11.12
  • NET version (net472, netcore, etc.): .NET Core 2.2.300
@y-code y-code linked a pull request Sep 7, 2019 that will close this issue
@JonathanMagnan JonathanMagnan self-assigned this Sep 9, 2019
@JonathanMagnan
Copy link
Member

Hello @y-code ,

Thank you, we will look at this request and your pull

Best Regards,

Jonathan


Performance Libraries
context.BulkInsert(list, options => options.BatchSize = 1000);
Entity Framework ExtensionsEntity Framework ClassicBulk OperationsDapper Plus

Runtime Evaluation
Eval.Execute("x + y", new {x = 1, y = 2}); // return 3
C# Eval FunctionSQL Eval Function

@James231
Copy link

Hi, are there any updates on this?

This issue is effecting a large number of websites (facebook.com being another example). While wrapping my code in try { } catch{ } does the trick, it is not ideal.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants