Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] HTMLAGILITY pack await web.LoadFromWebAsync not working on Windows Servers 2016 and 2019 #442

Open
broomop opened this issue Jun 20, 2021 · 8 comments
Assignees

Comments

@broomop
Copy link

broomop commented Jun 20, 2021

1. Description

I am getting no response and it acts as if nothing is happening. Tried using #171 async in this thread and it just sits there blinking and does nothing.

** var web = new HtmlWeb();
web.UsingCache = false;
web.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36";
var doc = await web.LoadFromWebAsync(page);**
tried without useragent but no different.

2. Exception

No exceptions and i cannot debug as i dont have the right equipment on the servers to do that. works fine on windows 10.

Exception message:
no exceptions.

3. Fiddle or Project

unable to but i see the traffic spike it looks like i even receive the data but nothing is shown.

4. Any further technical details

Add any relevant detail can help us, such as:

  • HAP version:1.11.34.0
  • NET version (4.6.1)
@JonathanMagnan JonathanMagnan self-assigned this Jun 21, 2021
@JonathanMagnan
Copy link
Member

JonathanMagnan commented Jun 21, 2021

Hello @broomop ,

Everything worked when I tried it.

I recommend you to try with ConfigureAwait(false):

await web.LoadFromWebAsync(html).ConfigureAwait(false)

Depending on the type of application, it might be required to avoid some thread deadlock.

Best Regards,

Jon


Sponsorship
Help us improve this library

Performance Libraries
context.BulkInsert(list, options => options.BatchSize = 1000);
Entity Framework ExtensionsBulk OperationsDapper Plus

Runtime Evaluation
Eval.Execute("x + y", new {x = 1, y = 2}); // return 3
C# Eval FunctionSQL Eval Function

@broomop
Copy link
Author

broomop commented Jun 21, 2021

you tried this on windows server 2016 or 2019?? i had no issues on windows 10 just the server editions.

@JonathanMagnan
Copy link
Member

Yes,

The test was on windows server 2016

It might also be caused by some security policy on your side.

The library is using an HttpClient:

var client = new HttpClient(clientHandler);

So perhaps you could try to grab the text on your side and simply make HAP parsing it after. Unfortunately, I don't see really anything that we could change that could help you ;(

@broomop
Copy link
Author

broomop commented Jun 22, 2021

hi it seems that httpclient isn't liked very much anymore. I am trying to see if i can unlock the http client supposedly its to do with asp.net and using web.config and allowing any logins etc... if you can help any further on this that would be great otherwise thanks for your help.

@broomop
Copy link
Author

broomop commented Jun 22, 2021

After reading some more someone mentioned the httpclient is not threadsafe the way it is. and should have a httpresponsemessage used as well:

https://docs.microsoft.com/en-us/dotnet/api/system.net.http.httpclient?view=netframework-4.7.2

@broomop
Copy link
Author

broomop commented Jun 22, 2021

I have figured that you have to code it exactly like this and wait each time on a windows server:

         try
		{
			GetHtmlDocumentAsync().GetAwaiter().GetResult();
		}
		catch (Exception ex)
		{
			Console.WriteLine(ex.Message);  
		}
		try
		{
			HtmlDocument test = GetHtmlDocument();
			Console.WriteLine(test.Text);
		}
		catch (Exception ex)
		{
			Console.WriteLine(ex.Message);
		}
		Console.ReadLine();
	}

	async public static Task<HtmlDocument> GetHtmlDocumentAsync()
	{
		HtmlWeb web = new HtmlWeb();
		return await web.LoadFromWebAsync(url);
	}

	public static HtmlDocument GetHtmlDocument()
	{
		HtmlWeb web = new HtmlWeb();
		return web.Load(url);
	}`

instead of just doing loads of await web.LoadFromWebAsync(url); with no handling.

@broomop
Copy link
Author

broomop commented Jun 22, 2021

could i also ask how would you do some sort of threaded method so that my application does not lock up?

@JonathanMagnan
Copy link
Member

Hello @broomop ,

Thank you for the information about the HttpClient not being thread-safe.

I really recommend you to grab the HTML on your side in this case and just use the LoadHtml method from the HtmlDocument to parse it.

The way HAP has been built doesn't currently work with a static HttpClient. So there is some issue that we need to speak about here first to determine how we want to solve this.

However, you can already solve it on your side by using all the best practices you already find out and take the HTML.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants