Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SelectNodes().RemoveAt breaks node relationships #419

Open
adeyblue opened this issue Dec 16, 2020 · 1 comment
Open

SelectNodes().RemoveAt breaks node relationships #419

adeyblue opened this issue Dec 16, 2020 · 1 comment
Assignees

Comments

@adeyblue
Copy link

Description

Removing elements from the results of SelectNodes causes siblings of nodes either side of the one removed to go missing or change which parent they belong to. Either that or I'm really misunderstanding something.

Fiddle or Project

// @nuget: HtmlAgilityPack

using System;
using HtmlAgilityPack;
					
public class Program
{
	public static void Main()
	{
		var html = 
		@"<html>
  <head>
    <title>Document</title>
  </head>
  <body>
    <div class=""divClass"">
	  <h3 class=""h3Class"">First Header</h3>
	    <p class=""pClass"">
           Hello
	    </p>
    </div>
	<div class=""divClass"">
	  <h3 class=""h3Class"">Second Header</h3>
	  <p class=""pClass"">
         World!
	  </p>
    </div>
	<div class=""divClass"">
	  <h3 class=""h3Class"">Third Header</h3>
	  <p class=""pClass"">
         Nonsense
	  </p>
    </div>
  </body>
</html>";

		var htmlDoc = new HtmlDocument();
		htmlDoc.LoadHtml(html);
		HtmlNode root = htmlDoc.DocumentNode;
		HtmlNodeCollection headers = root.SelectNodes("//h3[contains(@class, 'h3')]");
		// don't want the last one
		headers.RemoveAt(headers.Count - 1); // without this line, it does what I expect. Both h3's and p's are displayed
		foreach(HtmlNode node in headers)
		{
			Console.WriteLine("Found header: {0}", node.InnerText);
		}
		Console.WriteLine();
		DisplayAllSiblings(headers[0]); // 'p = Hello' should be displayed
		DisplayAllSiblings(headers[1]); // 'p = World!' has gone missing
	}
	
	static void DisplayAllSiblings(HtmlNode node)
	{
		HtmlNode parent = node.ParentNode;
		HtmlNodeCollection coll = parent.SelectNodes("./*");
		
		Console.WriteLine("Siblings of {0}:", node.InnerText);
		foreach(HtmlNode brother in coll)
		{
			Console.WriteLine("Node: {0} = {1}", brother.Name, brother.InnerText.Trim());
		}
		Console.WriteLine();
	}
}

Output of the above when removing the last node:

Found header: First Header
Found header: Second Header

Siblings of First Header:
Node: h3 = First Header
Node: p = Hello

Siblings of Second Header:
Node: h3 = Second Header

Output when changing the RemoveAt to headers.RemoveAt(1);

Found header: First Header
Found header: Third Header

Siblings of First Header:
Node: h3 = First Header
Node: h3 = Third Header
Node: p = Nonsense

Siblings of Third Header:
Node: h3 = Third Header
Node: p = Nonsense

Further technical details

  • HAP version: Whichever version dotnetfiddle uses, found in 1.8.10
  • NET version net472
@JonathanMagnan JonathanMagnan self-assigned this Dec 16, 2020
@JonathanMagnan
Copy link
Member

Hello @adeyblue ,

My developer took time to look at it and we recommend you to use instead the RemoveChild method such as:

var headerToRemove = headers.Last();
headerToRemove.ParentNode.RemoveChild(headerToRemove);

The problem with directly using RemoveAt is you use the method from the List<T> which doesn't raise the HasChanges method. To make it works, we would need to create our own List<T> class which I don't think is a good long-term solution.

Make sure to use methods provided by the library instead.

Let me know if that answer correctly to this issue

Best Regards,

Jon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants